{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 3: LLM-as-Judge - Phoenix Walkthrough "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook walks through the reference implementation for Homework 3, and discusses additional topics.  This notebook was reviewed in Priyan's HW review video."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "<center>\n",
    "    <p style=\"text-align:left\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://repository-images.githubusercontent.com/564072810/f3666cdf-cb3e-4056-8a25-27cb3e6b5848\" width=\"600\"/>\n",
    "        <br>\n",
    "        <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
    "    </p>\n",
    "</center>\n",
    "\n",
    "## Launch Phoenix \n",
    "\n",
    "First, let's set up Phoenix on our local machine. You should run these commands within your terminal in your chosen environment.\n",
    "\n",
    "(If you have already done this in a previous HW assignment, you are good to go.)\n",
    "\n",
    "**Install Phoenix**\n",
    "\n",
    "```pip install arize-phoenix```\n",
    "\n",
    "**Boot up Phoenix on localhost**\n",
    "\n",
    "```phoenix serve```\n",
    "\n",
    "## Set OpenAI API Key for LiteLLM calls\n",
    "\n",
    "```\n",
    "export OPENAI_API_KEY=\"your openai api key\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reference Implementation Structure\n",
    "```\n",
    "homeworks/hw3/\n",
    "├── scripts/\n",
    "│   ├── generate_traces.py          # Generate Recipe Bot traces with parallel processing\n",
    "│   ├── label_data.py               # Use GPT-4o to label ground truth (150 examples)\n",
    "│   ├── split_data.py               # Split data into train/dev/test sets\n",
    "│   ├── develop_judge.py            # Develop LLM judge with few-shot examples\n",
    "│   ├── evaluate_judge.py           # Evaluate judge performance on test set\n",
    "│   └── run_full_evaluation.py      # Run judge on all traces and compute metrics\n",
    "├── data/\n",
    "│   ├── dietary_queries.csv         # 60 challenging edge case queries we crafted\n",
    "│   ├── raw_traces.csv              # Generated Recipe Bot traces (~2400 total)\n",
    "│   ├── labeled_traces.csv          # Traces with ground truth labels (150)\n",
    "│   ├── train_set.csv               # Training examples for few-shot (~23)\n",
    "│   ├── dev_set.csv                 # Development set for judge refinement (~60)\n",
    "│   └── test_set.csv                # Test set for final evaluation (~67)\n",
    "└── results/\n",
    "│   ├── judge_performance.json      # TPR/TNR metrics on test set\n",
    "│   ├── final_evaluation.json       # Results with confidence intervals\n",
    "│   └── judge_prompt.txt            # Final judge prompt\n",
    "└── README.md                       # Project Spec and general project guide\n",
    "└── ai_evals_hw3_solution.ipynb        # Guide containing helpful Phoenix methods and links to Phoenix documentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Traces"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the start of the process.  It starts with queries that map to a dietary restriction.\n",
    "\n",
    "Traces are generated by `generate_traces.py`, which runs those queries though the model.  Some key notes:\n",
    "\n",
    "The `generate_traces.py` script calls `get_agent_response` **from the application code**.  This is ideal to minimize differences between experiments and production.\n",
    "\n",
    "Make sure to look at how the traces are generated within generate_traces.py.\n",
    "\n",
    "***You can view your traces in Phoenix, within the space you created!***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id,query,dietary_restriction\n",
      "1,I'm vegan but I really want to make something with honey - is there a good substitute? i am craving a yogurt breakfast,vegan\n",
      "2,Need a quick gluten-free breakfast. I hate eggs though.,gluten-free\n",
      "3,Keto breakfast that I can meal prep for the week,keto\n",
      "4,I'm dairy-free and also can't stand the taste of coconut milk. What dessert can I make?,dairy-free\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/anaconda3/envs/base2/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔭 OpenTelemetry Tracing Details 🔭\n",
      "|  Phoenix Project: recipe-agent\n",
      "|  Span Processor: BatchSpanProcessor\n",
      "|  Collector Endpoint: localhost:4317\n",
      "|  Transport: gRPC\n",
      "|  Transport Headers: {}\n",
      "|  \n",
      "|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.\n",
      "|  \n",
      "|  `register` has set this TracerProvider as the global OpenTelemetry default.\n",
      "|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.\n",
      "\n",
      "[bold blue]Recipe Bot Trace Generation\n",
      "==================================================\n",
      "Loaded 60 dietary queries\n",
      "Generating traces... This may take a while as we are making many LLM calls.\n",
      "Completed parallel generation of 2400 traces\n",
      "Successfully generated 2400 traces\n",
      "\n",
      "[bold]Summary Statistics:\n",
      "Total traces generated: 2400\n",
      "\n",
      "Traces per dietary restriction:\n",
      "  dairy-free: 120\n",
      "  diabetic-friendly: 160\n",
      "  gluten-free: 160\n",
      "  halal: 80\n",
      "  keto: 160\n",
      "  kosher: 40\n",
      "  low-carb: 240\n",
      "  low-sodium: 40\n",
      "  nut-free: 80\n",
      "  paleo: 200\n",
      "  pescatarian: 120\n",
      "  raw vegan: 120\n",
      "  raw vegan : 40\n",
      "  sugar-free: 120\n",
      "  vegan: 280\n",
      "  vegetarian: 280\n",
      "  whole30: 160\n"
     ]
    }
   ],
   "source": [
    "!head -5 data/'dietary_queries.csv'\n",
    "\n",
    "%run scripts/generate_traces.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Label the Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to label and create ground truth labels for traces.  The `label_data.py` script creates these ground truth labels.\n",
    "\n",
    "We want to portion off about 600 spans so that we can build an evaluator using this data. We only use a sample of 600 spans (out of 2400) because it's more cost effective. \n",
    "\n",
    "Another reason is that we will need to generate ground truth labels on these 200 spans, which is increasingly difficult as you select more spans for training/testing. \n",
    "\n",
    "In the solution code, an LLM is used to generate ground truth labels. This is bad practice, as you should not trust an LLM to give you ground truth, unless you verify it manually. This is simply a shortcut the solution takes for this assignment.\n",
    "\n",
    "Make sure to look at the labeling process in ```label_data.py```.\n",
    "\n",
    "***After running the code below, you should see ground truth labels on 200 of your Phoenix traces!***\n",
    "\n",
    "***In the filter search bar, insert ```annotations[\"Ground Truth Labels\"]``` to see the 200 traces with ground truth labels.***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# enabling concurrency for Phoenix\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ground_truth_label,ground_truth_explanation,prompt,output,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,attributes.input.mime_type,attributes.output.value,attributes.openinference.span.kind,attributes.success,attributes.query,attributes.output.mime_type,attributes.id,attributes.trace_num,attributes.dietary_restriction,attributes.input.value\n",
      "PASS,\"The recipe for Herb-Roasted Chicken Thighs with Vegetables adheres to Whole30 guidelines. It includes compliant ingredients such as chicken thighs, olive oil, garlic powder, onion powder, paprika, dried thyme, sweet potatoes, and green beans. There are no grains, dairy, legumes, sugar, alcohol, or processed foods included in the recipe. The preparation method involves roasting, which is compliant with Whole30 as it does not involve any non-compliant cooking techniques or ingredients.\",\"You are an expert nutritionist and dietary specialist. Your task is to evaluate whether a recipe response properly adheres to the specified dietary restriction.\n",
      "Recipe Bot Trace Labeling with Phoenix Evals\n",
      "==================================================\n",
      "Loading traces from Phoenix...\n",
      "Loaded traces from Phoenix\n",
      "Labeling traces with Phoenix evals...\n",
      "Labeling 600 traces with Phoenix evals...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llm_generate |██████████| 600/600 (100.0%) | ⏳ 02:03<00:00 |  4.87it/s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completed labeling of 600 traces\n",
      "Logged evaluations to Phoenix\n",
      "Available traces: 577 PASS, 23 FAIL\n",
      "Balanced dataset: 50 PASS, 23 FAIL\n",
      "                 ground_truth_label  \\\n",
      "context.span_id                       \n",
      "5b3e86e5fa8c58d0               PASS   \n",
      "9c831dbfcc806787               FAIL   \n",
      "b73c68181fb0944d               PASS   \n",
      "da1bc90fcfae8063               PASS   \n",
      "d06915fc89f7c994               PASS   \n",
      "\n",
      "                                           ground_truth_explanation  \\\n",
      "context.span_id                                                       \n",
      "5b3e86e5fa8c58d0  The Keto Chocolate Avocado Mousse recipe adher...   \n",
      "9c831dbfcc806787  The recipe for Almond Flour Chocolate Chip Coo...   \n",
      "b73c68181fb0944d  The recipe for Garlic Shrimp Pasta adheres to ...   \n",
      "da1bc90fcfae8063  The recipe adheres to the vegan dietary restri...   \n",
      "d06915fc89f7c994  The recipe adheres to the vegan dietary restri...   \n",
      "\n",
      "                                                             prompt  \\\n",
      "context.span_id                                                       \n",
      "5b3e86e5fa8c58d0  You are an expert nutritionist and dietary spe...   \n",
      "9c831dbfcc806787  You are an expert nutritionist and dietary spe...   \n",
      "b73c68181fb0944d  You are an expert nutritionist and dietary spe...   \n",
      "da1bc90fcfae8063  You are an expert nutritionist and dietary spe...   \n",
      "d06915fc89f7c994  You are an expert nutritionist and dietary spe...   \n",
      "\n",
      "                               name span_kind parent_id  \\\n",
      "context.span_id                                           \n",
      "5b3e86e5fa8c58d0  Query_Information     CHAIN      None   \n",
      "9c831dbfcc806787  Query_Information     CHAIN      None   \n",
      "b73c68181fb0944d  Query_Information     CHAIN      None   \n",
      "da1bc90fcfae8063  Query_Information     CHAIN      None   \n",
      "d06915fc89f7c994  Query_Information     CHAIN      None   \n",
      "\n",
      "                                       start_time  \\\n",
      "context.span_id                                     \n",
      "5b3e86e5fa8c58d0 2025-08-05 19:46:49.621869+00:00   \n",
      "9c831dbfcc806787 2025-08-05 19:45:51.771569+00:00   \n",
      "b73c68181fb0944d 2025-08-05 19:45:15.407619+00:00   \n",
      "da1bc90fcfae8063 2025-08-05 19:48:33.780817+00:00   \n",
      "d06915fc89f7c994 2025-08-05 19:47:42.242426+00:00   \n",
      "\n",
      "                                         end_time status_code status_message  \\\n",
      "context.span_id                                                                \n",
      "5b3e86e5fa8c58d0 2025-08-05 19:46:59.100635+00:00       UNSET                  \n",
      "9c831dbfcc806787 2025-08-05 19:46:06.343950+00:00       UNSET                  \n",
      "b73c68181fb0944d 2025-08-05 19:45:26.781802+00:00       UNSET                  \n",
      "da1bc90fcfae8063 2025-08-05 19:48:44.395726+00:00       UNSET                  \n",
      "d06915fc89f7c994 2025-08-05 19:47:49.616004+00:00       UNSET                  \n",
      "\n",
      "                  ... attributes.input.mime_type  \\\n",
      "context.span_id   ...                              \n",
      "5b3e86e5fa8c58d0  ...                 text/plain   \n",
      "9c831dbfcc806787  ...                 text/plain   \n",
      "b73c68181fb0944d  ...                 text/plain   \n",
      "da1bc90fcfae8063  ...                 text/plain   \n",
      "d06915fc89f7c994  ...                 text/plain   \n",
      "\n",
      "                                            attributes.output.value  \\\n",
      "context.span_id                                                       \n",
      "5b3e86e5fa8c58d0  I have a delightful recipe for Keto Chocolate ...   \n",
      "9c831dbfcc806787  Here's a delightful recipe for **Almond Flour ...   \n",
      "b73c68181fb0944d  Certainly! Here's a delicious recipe for **Gar...   \n",
      "da1bc90fcfae8063  **Creamy Cashew Vegan Cheese**\\n\\nThis recipe ...   \n",
      "d06915fc89f7c994  Here's a delicious Vegan Protein Smoothie reci...   \n",
      "\n",
      "                 attributes.openinference.span.kind attributes.success  \\\n",
      "context.span_id                                                          \n",
      "5b3e86e5fa8c58d0                              CHAIN               True   \n",
      "9c831dbfcc806787                              CHAIN               True   \n",
      "b73c68181fb0944d                              CHAIN               True   \n",
      "da1bc90fcfae8063                              CHAIN               True   \n",
      "d06915fc89f7c994                              CHAIN               True   \n",
      "\n",
      "                                                   attributes.query  \\\n",
      "context.span_id                                                       \n",
      "5b3e86e5fa8c58d0  I'm on keto but I'm craving something sweet. H...   \n",
      "9c831dbfcc806787  Diabetic-friendly dessert that doesn't use art...   \n",
      "b73c68181fb0944d  I'm pescatarian but I hate fish. Can you give ...   \n",
      "da1bc90fcfae8063  Vegan cheese recipe but it should taste good. ...   \n",
      "d06915fc89f7c994   Vegan protein smoothie that doesn't taste chalky   \n",
      "\n",
      "                 attributes.output.mime_type  attributes.id  \\\n",
      "context.span_id                                               \n",
      "5b3e86e5fa8c58d0                  text/plain             15   \n",
      "9c831dbfcc806787                  text/plain             12   \n",
      "b73c68181fb0944d                  text/plain             10   \n",
      "da1bc90fcfae8063                  text/plain             23   \n",
      "d06915fc89f7c994                  text/plain             19   \n",
      "\n",
      "                 attributes.trace_num attributes.dietary_restriction  \\\n",
      "context.span_id                                                        \n",
      "5b3e86e5fa8c58d0                   32                           keto   \n",
      "9c831dbfcc806787                   18              diabetic-friendly   \n",
      "b73c68181fb0944d                   12                    pescatarian   \n",
      "da1bc90fcfae8063                   16                          vegan   \n",
      "d06915fc89f7c994                   27                          vegan   \n",
      "\n",
      "                                             attributes.input.value  \n",
      "context.span_id                                                      \n",
      "5b3e86e5fa8c58d0  I'm on keto but I'm craving something sweet. H...  \n",
      "9c831dbfcc806787  Diabetic-friendly dessert that doesn't use art...  \n",
      "b73c68181fb0944d  I'm pescatarian but I hate fish. Can you give ...  \n",
      "da1bc90fcfae8063  Vegan cheese recipe but it should taste good. ...  \n",
      "d06915fc89f7c994   Vegan protein smoothie that doesn't taste chalky  \n",
      "\n",
      "[5 rows x 23 columns]\n",
      "\n",
      "Labeling Summary:\n",
      "Total labeled traces: 600\n",
      "\n",
      "Label distribution:\n",
      "  PASS: 50\n",
      "  FAIL: 23\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "!head -2 data/'labeled_traces.csv'\n",
    "\n",
    "from scripts.label_data import (\n",
    "    LABELING_PROMPT,\n",
    "    balance_labels,\n",
    "    generate_phoenix_labels,\n",
    "    load_traces_from_phoenix,\n",
    ")\n",
    "\n",
    "print(\"Recipe Bot Trace Labeling with Phoenix Evals\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "# Load traces from Phoenix\n",
    "print(\"Loading traces from Phoenix...\")\n",
    "trace_df = load_traces_from_phoenix()\n",
    "\n",
    "if trace_df.empty:\n",
    "    print(\"Error: No traces found in Phoenix!\")\n",
    "    print(\"Please run generate_traces.py first to generate traces.\")\n",
    "    exit()\n",
    "\n",
    "# Label traces with Phoenix evals\n",
    "print(\"Labeling traces with Phoenix evals...\")\n",
    "test_results = generate_phoenix_labels(trace_df, prompt=LABELING_PROMPT, sample_size=600)\n",
    "\n",
    "labeled_df = balance_labels(test_results, target_positive=50, target_negative=50)\n",
    "print(labeled_df.head())\n",
    "# Print summary statistics\n",
    "print(\"\\nLabeling Summary:\")\n",
    "print(f\"Total labeled traces: {len(test_results)}\")\n",
    "\n",
    "label_counts = labeled_df[\"ground_truth_label\"].value_counts()\n",
    "print(\"\\nLabel distribution:\")\n",
    "for label, count in label_counts.items():\n",
    "    print(f\"  {label}: {count}\")\n",
    "\n",
    "labeled_df.to_csv(\"data/labeled_traces.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split Your Labeled Data\n",
    "\n",
    "In every data driven approach you should split your data in different sets.\n",
    "\n",
    "They may be called `train`, `validation`, and `test`.  Or `train`, `dev`, `test`.  But you need 3 sets.  This is true in machine learning as well as statistics.  They serve different purposes:\n",
    "\n",
    "- **train**: You can do anything with this and \"train\" you model on this data.  In this case training your model is using it to create few-shot examples for your prompt.  But you could do RAG against these, or use them for fine tuning or anything.  They are fair game for everything.\n",
    "- **validation**:  This is what you regularly measure against for development.  These cannot be used for RAG, or put in your prompt, or trained on.  But when you have a good solution you can iterate by testing how well it performs on the `validation` set.\n",
    "- **Test**: This is you ultimate protection to ensure your experiment results are going to translate to production and you can predict what the impact of your change will be.  Every time you measure against it and look at it you lose some of that protection.  So do so very sparingly! \n",
    "\n",
    "> This is to make sure you model can *generalize* beyond the specific things you have seen.  There are many words for overfitting in different contexts such as overfitting, p-hacking, data leakage, lookahead bias, and more."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stratified Splitting Script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `split_data.py` script uses a more advanced splitting approach called `stratified splitting`.\n",
    "\n",
    "Instead of making dev/test/train sets purely randomly, it ensures that each of the categories are proportionate in each set.  If 10% of the samples are `FAIL`, this it ensures that roughy 10% of the samples in each of the sets are `FAIL` and we don't end up with imbalanced based on random chance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data Splitting for LLM Judge Development\n",
      "==================================================\n",
      "Loaded 72 labeled traces\n",
      "Splitting data into train/dev/test sets...\n",
      "Data splits validation passed!\n",
      "\n",
      "[bold]Data Split Statistics:\n",
      "Total traces: 72\n",
      "Train: 10 (13.9%)\n",
      "Dev: 29 (40.3%)\n",
      "Test: 33 (45.8%)\n",
      "\n",
      "[bold]Label Distribution:\n",
      "Train:\n",
      "  FAIL: 3 (30.0%)\n",
      "  PASS: 7 (70.0%)\n",
      "Dev:\n",
      "  FAIL: 9 (31.0%)\n",
      "  PASS: 20 (69.0%)\n",
      "Test:\n",
      "  FAIL: 10 (30.3%)\n",
      "  PASS: 23 (69.7%)\n",
      "\n",
      "[bold]Dietary Restrictions in Train Set:\n",
      "  dairy-free: 1\n",
      "  diabetic-friendly: 11\n",
      "  gluten-free: 3\n",
      "  keto: 3\n",
      "  low-carb: 1\n",
      "  paleo: 1\n",
      "  raw vegan: 4\n",
      "  vegan: 4\n",
      "  vegetarian: 5\n",
      "\n",
      "Data splitting completed successfully!\n",
      "\n",
      "Split Rationale:\n",
      "- Train (15%): Small set for few-shot examples in judge prompt\n",
      "- Dev (40%): Large set for iterative judge development and tuning\n",
      "- Test (45%): Large set for final unbiased evaluation of judge performance\n"
     ]
    }
   ],
   "source": [
    "from scripts.split_data import (\n",
    "    load_labeled_traces,\n",
    "    print_split_statistics,\n",
    "    stratified_split,\n",
    "    validate_splits,\n",
    ")\n",
    "\n",
    "print(\"Data Splitting for LLM Judge Development\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "# Load labeled traces\n",
    "labeled_path = \"data/labeled_traces.csv\"\n",
    "traces = load_labeled_traces(str(labeled_path))\n",
    "print(f\"Loaded {len(traces)} labeled traces\")\n",
    "\n",
    "# Split the data\n",
    "print(\"Splitting data into train/dev/test sets...\")\n",
    "train_df, dev_df, test_df = stratified_split(\n",
    "    traces,\n",
    "    train_ratio=0.15,  # Small train set for few-shot examples\n",
    "    dev_ratio=0.40,  # Larger dev set for judge development\n",
    "    test_ratio=0.45,  # Large test set for final evaluation\n",
    ")\n",
    "\n",
    "# Validate splits\n",
    "if not validate_splits(train_df, dev_df, test_df):\n",
    "    print(\"Data split validation failed!\")\n",
    "    exit()\n",
    "\n",
    "# Save splits locally\n",
    "train_path = \"data/train_set.csv\"\n",
    "dev_path = \"data/dev_set.csv\"\n",
    "test_path = \"data/test_set.csv\"\n",
    "train_df.to_csv(train_path, index=False)\n",
    "dev_df.to_csv(dev_path, index=False)\n",
    "test_df.to_csv(test_path, index=False)\n",
    "\n",
    "# Print statistics\n",
    "print_split_statistics(train_df, dev_df, test_df)\n",
    "\n",
    "print(\"\\nData splitting completed successfully!\")\n",
    "print(\"\\nSplit Rationale:\")\n",
    "print(\"- Train (15%): Small set for few-shot examples in judge prompt\")\n",
    "print(\"- Dev (40%): Large set for iterative judge development and tuning\")\n",
    "print(\"- Test (45%): Large set for final unbiased evaluation of judge performance\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload Datasets to Phoenix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can upload our train/dev/test datasets to Phoenix, as we will use Phoenix experiments heavily in building the LLM-as-Judge evaluator.\n",
    "\n",
    "***You can view your datasets in the dataset tab in Phoenix***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "📤 Uploading dataset...\n",
      "💾 Examples uploaded: http://127.0.0.1:6006/datasets/RGF0YXNldDox/examples\n",
      "🗄️ Dataset version ID: RGF0YXNldFZlcnNpb246MQ==\n",
      "📤 Uploading dataset...\n",
      "💾 Examples uploaded: http://127.0.0.1:6006/datasets/RGF0YXNldDoy/examples\n",
      "🗄️ Dataset version ID: RGF0YXNldFZlcnNpb246Mg==\n",
      "📤 Uploading dataset...\n",
      "💾 Examples uploaded: http://127.0.0.1:6006/datasets/RGF0YXNldDoz/examples\n",
      "🗄️ Dataset version ID: RGF0YXNldFZlcnNpb246Mw==\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Upload the data splits to Phoenix\n",
    "from phoenix.client import AsyncClient\n",
    "\n",
    "phoenix_client = AsyncClient()\n",
    "\n",
    "train_df = pd.read_csv(\"data/train_set.csv\")\n",
    "dev_df = pd.read_csv(\"data/dev_set.csv\")\n",
    "test_df = pd.read_csv(\"data/test_set.csv\")\n",
    "\n",
    "train_dataset = await phoenix_client.datasets.create_dataset(\n",
    "    dataframe=train_df,\n",
    "    name=\"train_set\",\n",
    "    input_keys=[\"attributes.query\"],\n",
    "    output_keys=[\"attributes.output.value\", \"ground_truth_label\", \"ground_truth_explanation\"],\n",
    "    metadata_keys=[\"attributes.dietary_restriction\", \"attributes.trace_num\"],\n",
    ")\n",
    "\n",
    "dev_dataset = await phoenix_client.datasets.create_dataset(\n",
    "    dataframe=dev_df,\n",
    "    name=\"dev_set\",\n",
    "    input_keys=[\"attributes.query\"],\n",
    "    output_keys=[\"attributes.output.value\", \"ground_truth_label\", \"ground_truth_explanation\"],\n",
    "    metadata_keys=[\"attributes.dietary_restriction\", \"attributes.trace_num\"],\n",
    ")\n",
    "\n",
    "test_dataset = await phoenix_client.datasets.create_dataset(\n",
    "    dataframe=test_df,\n",
    "    name=\"test_set\",\n",
    "    input_keys=[\"attributes.query\"],\n",
    "    output_keys=[\"attributes.output.value\", \"ground_truth_label\", \"ground_truth_explanation\"],\n",
    "    metadata_keys=[\"attributes.dietary_restriction\", \"attributes.trace_num\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Develop Your LLM-as-Judge Prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be using Phoenix [Datasets](../phoenix_methods_guide.md#datasets) and [Experiments](../phoenix_methods_guide.md#experiments) to test out our evaluator on our dev and test sets. Since we are currently in the training process, we will experiment on the dev set and see how our evaluator performs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start off with a simple prompt for our LLM-as-Judge evaluator and see how it performs on our dev set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_prompt = \"\"\"\n",
    "Query: {attributes.query}\n",
    "Dietary Restriction: {attributes.dietary_restriction}\n",
    "Model Output: {attributes.output.value}\n",
    "\n",
    "Return your answer in the following JSON format:\n",
    "\n",
    "\"label\": \"PASS\" or \"FAIL\"\n",
    "\"explanation\": \"Explanation for your answer\"\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's build an evaluator using this prompt and see how it performs on our dev set by running a Phoenix experiment.\n",
    "\n",
    "To run a Phoenix experiment you need two main aspects: a task and a set of experiment evals. \n",
    "\n",
    "Your task defines the action in the Phoenix experiment - for us, it's running our evaluator on our traces. \n",
    "\n",
    "Your experiment evals are the ways you want to evaluate your experiment. Since we are experimenting with our LLM-as-judge evaluator, our evals will be testing how that evaluator is performing. It's essentially evals on top of our evaluator, which help us see if our evaluator is working well!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running evaluator experiment...\n",
      "🧪 Experiment started.\n",
      "📺 View dataset experiments: http://127.0.0.1:6006/datasets/RGF0YXNldDoy/experiments\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDoy/compare?experimentId=RXhwZXJpbWVudDo2\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Task runs completed.\n",
      "🧠 Evaluation started.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "running tasks |██████████| 29/29 (100.0%) | ⏳ 00:35<00:00 |  1.23s/it\n",
      "\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "\u001b[A\n",
      "running experiment evaluations |██████████| 145/145 (100.0%) | ⏳ 00:17<00:00 |  8.47it/s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDoy/compare?experimentId=RXhwZXJpbWVudDo2\n",
      "\n",
      "Experiment Summary (08/04/25 11:05 PM -0700)\n",
      "--------------------------------------------\n",
      "  evaluator   n  n_scores  avg_score  n_labels              top_2_labels\n",
      "0  accuracy  29        29   0.724138        29  {'True': 21, 'False': 8}\n",
      "1   eval_fn  29        29   0.000000        29             {'False': 29}\n",
      "2   eval_fp  29        29   0.275862        29  {'False': 21, 'True': 8}\n",
      "3   eval_tn  29        29   0.034483        29  {'False': 28, 'True': 1}\n",
      "4   eval_tp  29        29   0.689655        29  {'True': 20, 'False': 9}\n",
      "\n",
      "Tasks Summary (08/04/25 11:04 PM -0700)\n",
      "---------------------------------------\n",
      "   n_examples  n_runs  n_errors\n",
      "0          29      29         0\n",
      "Experiment completed! Experiment ID: RXhwZXJpbWVudDo2\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "<span style=\"font-weight: bold\">Judge Performance on Dev Set:</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n",
       "\u001b[1mJudge Performance on Dev Set:\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Positive Rate <span style=\"font-weight: bold\">(</span>TPR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.000</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Positive Rate \u001b[1m(\u001b[0mTPR\u001b[1m)\u001b[0m: \u001b[1;36m1.000\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Negative Rate <span style=\"font-weight: bold\">(</span>TNR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.111</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Negative Rate \u001b[1m(\u001b[0mTNR\u001b[1m)\u001b[0m: \u001b[1;36m0.111\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Balanced Accuracy: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.556</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Balanced Accuracy: \u001b[1;36m0.556\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAf8AAAG2CAYAAABxpo8aAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMlJJREFUeJzt3Qt4VNW58PF3EiDhlnAnCYSb3OVqUAyiwAFB9CAXbZXSQ5CLX3uEoggqXiCKmh6poBYErSLaioAWUNGmB1GIGNBys2IlEogkkYsgQkg4CWFmvmctO5NMSCCTPZOZzPr/+qwnM3v23rOS+vDu911rr21zOp1OAQAAxggLdAcAAED1IvgDAGAYgj8AAIYh+AMAYBiCPwAAhiH4AwBgGII/AACGIfgDAGAYgj8AAIYh+AMAYBiCPwAAfpCSkiJXX321NGzYUFq0aCFjxoyRjIwMj30KCwvlnnvukaZNm0qDBg3ktttuk+PHj1/yvGpV/nnz5klsbKzUrVtXhg0bJgcOHPCqbwR/AAD8YOvWrTqw79ixQzZt2iTFxcUyfPhwKSgocO9z3333yfvvvy9vv/223v/IkSMybty4S573mWeekRdeeEGWL18un3/+udSvX19GjBihLyQqy8aDfQAA8L8TJ07oCoAK8jfccIOcOXNGmjdvLqtWrZLbb79d77N//37p1q2bbN++Xa699tqLzqFCdlxcnNx///0ye/ZsvU2dp2XLlrJy5Uq58847K9WXWlKDORwOfZWkSio2my3Q3QEAeEkFs7Nnz+qAFhbmv2K0yorPnz/vk/7aysSbiIgI3S5HBWmlSZMm+ueuXbt0NUCV7V26du0qbdq0qTD4Z2VlybFjxzyOiY6Olv79++tjjAj+KvDHx8cHuhsAAItycnKkdevWfgv87ds2kGM/2C2fq0GDBpKfn++xbf78+ZKcnHzZZPXee++V6667Tnr06KG3qSBep04dadSokce+KotXn5XHtV3tU9ljQi74q4xfGdRovNSy1Ql0dwC/yJnUJdBdAPzGXlQoB198wv3vuT+ojF8F/sO72klUw6pXF/LOOqRtwnf6QiUqKsq9vTJZvxr737dvn2zbtk2CQY0O/q7Siwr8BH+EqvCIyEB3AfC76hi6bdDQpltVOeTnY1XgLx38L2f69OmyceNGSUtL86huxMTE6AuT06dPe2T/ara/+qw8ru1qHzXbv/Qxffr0qXSfmO0PADCC3emw3LydH6AC//r16+Xjjz+W9u3be3yekJAgtWvXls2bN7u3qVsBs7OzJTExsdxzqnOoC4DSx+Tl5elZ/xUdE3KZPwAAleUQp25V5e2xqtSvZvK/++67eljDNSavJuip+/PVzylTpsisWbP0JEBVTZgxY4YO4qUn+6lJgGrNgLFjx+oKiZo78OSTT0qnTp30xcBjjz2mJ0yqdQQqi+APAIAfLFu2TP8cPHiwx/bXXntNJk2apF8vXrxY3+WgFvcpKirS9+u/+OKLHvuraoDrTgHlgQce0GsF3H333XrIYODAgZKamiqRkZFm3OevSh3qymlo4yTG/BGysu/uFuguAH6d8Pft4od1cPNmHL0qseJIRmvLE/7iuuT6ta/VhcwfAGAEu9OpW1VZOTbYMOEPAADDkPkDAIxQ3RP+ghnBHwBgBBW87QR/jbI/AACGIfMHABiBsn8Jgj8AwAjM9i9B2R8AAMOQ+QMAjKBW5ndYPD5UEPwBAEawW5ztb2fMHwCAmsXu/LlVlZVjgw1j/gAAGIbMHwBgBMb8SxD8AQBGcIhN7GKzdHyooOwPAIBhyPwBAEZwOH9uVWXl2GBD8AcAGMFusexvp+wPAABqKjJ/AIARyPxLEPwBAEZwOG26VZWVY4MNZX8AAAxD5g8AMAJl/xIEfwCAEewSpltV2SV0EPwBAEZwWhzzdzLmDwAAaioyfwCAERjzL0HwBwAYwe4M062q7CG0vC9lfwAADEPmDwAwgnokr8NCzuuQ0En9Cf4AACMw5l+Csj8AAIYh8wcAGMH6hD+nhAqCPwDAoDF/Cw/2oewPAABqKjJ/AIARHBbX9neE0Gx/Mn8AgFFj/nYLzRtpaWkyatQoiYuLE5vNJhs2bPD4XG0rry1cuLDCcyYnJ1+0f9euXb3+W5D5AwCMyfyr8z7/goIC6d27t0yePFnGjRt30edHjx71eP+3v/1NpkyZIrfddtslz3vllVfKRx995H5fq5b3oZzgDwCAH4wcOVK3isTExHi8f/fdd2XIkCHSoUOHS55XBfuyx3qL4A8AMILdadOtqlzH5uXleWyPiIjQzYrjx4/LBx98IK+//vpl9z1w4IAeSoiMjJTExERJSUmRNm3aePV9jPkDAIygJvtZbUp8fLxER0e7mwq+Vqmg37Bhw3KHB0rr37+/rFy5UlJTU2XZsmWSlZUl119/vZw9e9ar7yPzBwDACzk5ORIVFeV+bzXrV1asWCETJkzQ2fyllB5G6NWrl74YaNu2raxdu1bPF6gsgj8AwAgOZ5huVeX49wp/KvCXDv5Wffrpp5KRkSFr1qzx+thGjRpJ586dJTMz06vjKPsDAIzgq7K/r7366quSkJCg7wzwVn5+vhw8eFBiY2O9Oo7gDwCAH6jAvHfvXt0UNT6vXmdnZ7v3UZMH3377bZk6dWq55xg6dKgsWbLE/X727NmydetW+e677yQ9PV3Gjh0r4eHhMn78eK/6RtkfAGAER6kZ+1U93hs7d+7Ut+65zJo1S/9MSkrSk/aU1atXi9PprDB4q6z+5MmT7ve5ubl63x9//FGaN28uAwcOlB07dujX3iD4AwCMYH2RnzCv9h88eLAO7Jdy991361YRleGXpi4WfIGyPwAAhiHzBwAYoSrr85dm5dhgQ/AHABjBITbdqsrKscGG4A8AMAKZf4nQ+U0AAEClkPkDAIxgdaEeewjlywR/AIARHE6bblVl5dhgEzqXMQAAoFLI/AEARlCL9Fgp3TtCKF8m+AMAjGD9qX5hEipC5zcBAACVQuYPADCCXWy6VZWVY4MNwR8AYATK/iVC5zcBAACVQuYPADCC3WLp3i6hg+APADACZf8SBH8AgBF4sE+J0PlNAABApZD5AwCM4BSbOCyM+Tu51Q8AgJqFsn+J0PlNAABApZD5AwCMwCN9SxD8AQBGsFt8qp89hIrlofObAACASiHzBwAYgbJ/CYI/AMAIDgnTraqsHBtsQuc3AQAAlULmDwAwgt1p062qrBwbbAj+AAAjMOZfguAPADCC0+JT/Zys8AcAAGoqMn8AgBHsYtOtqqwcG2wI/gAAIzic1sbtHU4JGZT9AQAwDJk/KqVHwmm5bXKOdOx+Vpq2OC8LZlwp2z9uHuhuAT4RZnPIf/ffKf/Z9VtpVv+cnMivLxu+6SIvfZEgEkKlXtM5LE74czDhz7eWLl0q7dq1k8jISOnfv7988cUXge4Syoisa5esjPry4pOdAt0VwOem9Nsjd/T6Wp7ecr3c+sadsuiza2Vywl6Z0PurQHcNPuQQm+XmjbS0NBk1apTExcWJzWaTDRs2eHw+adIkvb10u+mmm6olZgY8+K9Zs0ZmzZol8+fPl927d0vv3r1lxIgR8sMPPwS6ayhl57am8sYLHWT7ZrJ9hJ4+scflk0PtJO27tnLkbJRsyrxC0rNbS88Y/h1C1RUUFOiYpoJ1RVSwP3r0qLu99dZb1RIzAx78Fy1aJNOmTZO77rpLunfvLsuXL5d69erJihUrAt01AIbYe7Sl9I//Xto2Oq3fd2l2Uq6KOyafftcm0F2DH1b4s1to3hg5cqQ8+eSTMnbs2Ar3iYiIkJiYGHdr3LhxtcTMgI75nz9/Xnbt2iVz5851bwsLC5Nhw4bJ9u3bA9k1AAZ55R9XSf06xfL+xLfE7giT8DCHvJDeXz7I6BzoriHEx/y3bNkiLVq00EH/P/7jP/TFQtOmTf0eMwMa/E+ePCl2u11atmzpsV29379//0X7FxUV6eaSl5dXLf0EENpu6pwp/9nlW3kwdZhk/thEujY/KQ/e8Jn8UFBP3vuma6C7hyCTVyb2qOxdNW+pkv+4ceOkffv2cvDgQXn44Yd1tUAF8vDwcMsxM2Rm+6ekpMjjjz8e6G4ACDH3D9wur+y8Sv727c8TWg/82FRiG+bL1H57CP4hRE/as3Kfv/x8bHx8vMd2Nf6enJzs9fnuvPNO9+uePXtKr1695IorrtDVgKFDh4o/BTT4N2vWTF/dHD9+3GO7eq/GPspSpQ410aH01VfZ/xMAwFuRtS6Is8wCLipIhNlCaFUXiLMKM/bLHq/k5ORIVFSUe3tVsv7ydOjQQcfFzMzMcoO/tzEzaCf81alTRxISEmTz5s3ubQ6HQ79PTEy8aH/1B1Z/8NIN1SOy3gXp0PWsbkrL1oX6dfPYwkB3DbBsS1Y7mXb1brmh3WGJa5gnQ684JBP7fimbD7YPdNfgh6f6OSw0pWwc8lXwz83NlR9//FFiY2N9EjODuuyvMvmkpCTp16+fXHPNNfLcc8/p2yPUTEYEj05XnpX/Wfml+/3dDx7UPzdtaCmLH+kWwJ4B1j29ZaDMSPxCHh2SJk3q/Z9e5Oftfd1l2ef9At011GD5+fk6i3fJysqSvXv3SpMmTXRTw9i33XabztrVmP8DDzwgHTt21LfuuagKgLpbYPr06T6NmQEP/nfccYecOHFC5s2bJ8eOHZM+ffpIamrqRRMaEFhf/aOx3Hzl4EB3A/CLc8V15H/SBuqG0FXds/137twpQ4YMcb93DVur4L1s2TL55z//Ka+//rqcPn1aLwQ0fPhwWbBggUclQV0UqIl+vo6ZNqez7EhXzaHG/KOjo2Vo4ySpZasT6O4AfpF9N5UVhC57UaF8u/hhOXPmjN+Gcl2xYvT/Tpba9aseK4oLzsu7w1f4ta/VJeCL/AAAgOoV8LI/AADVoSrr85dm5dhgQ/AHABih9Iz9qrBybLCh7A8AgGHI/AEARiDzL0HwBwAYgeBfgrI/AACGIfMHABiBzL8EwR8AYAS1op21B/uEDoI/AMAIZP4lGPMHAMAwZP4AACOQ+Zcg+AMAjEDwL0HZHwAAw5D5AwCMQOZfguAPADCC02nTraqsHBtsKPsDAGAYMn8AgBHUAj9WFvlxWDg22BD8AQBGYMy/BGV/AAAMQ+YPADACE/5KEPwBAEag7F+C4A8AMAKZfwnG/AEAMAyZPwDACCpzt1K6d4ZQ5k/wBwAYwakDuLXjQwVlfwAADEPmDwAwglqhT/2vqljhDwCAGobZ/iUo+wMAYBgyfwCAEdRMfxuL/GgEfwCAEdRMf0uz/Z0SMij7AwBgGDJ/AIARmPBXguAPADACwb8EwR8AYAQm/JVgzB8AAD9IS0uTUaNGSVxcnNhsNtmwYYP7s+LiYnnwwQelZ8+eUr9+fb3PxIkT5ciRI5c8Z3Jysj5X6da1a1ev+0bwBwAYNdvfaaF5o6CgQHr37i1Lly696LNz587J7t275bHHHtM/161bJxkZGXLrrbde9rxXXnmlHD161N22bdvmXcco+wMATPFzALcy5u/d/iNHjtStPNHR0bJp0yaPbUuWLJFrrrlGsrOzpU2bNhWet1atWhITEyNWkPkDAOCFvLw8j1ZUVCS+cObMGV3Gb9So0SX3O3DggB4m6NChg0yYMEFfLHiL4A8AMGq2v9NCU+Lj43Xm7mopKSmW+1ZYWKjnAIwfP16ioqIq3K9///6ycuVKSU1NlWXLlklWVpZcf/31cvbsWa++j7I/AMAIqmpvZZE+579/5uTkeAToiIgIS/1Sk/9++ctfitPp1AH9UkoPI/Tq1UtfDLRt21bWrl0rU6ZMqfR3EvwBAPCCCvyXys6rEvgPHz4sH3/8sdfnVUMEnTt3lszMTK+Oo+wPADCCr8r+vuIK/GoM/6OPPpKmTZt6fY78/Hw5ePCgxMbGenUcwR8AYFbd32mheRmY9+7dq5uixufVazVBTwX+22+/XXbu3Clvvvmm2O12OXbsmG7nz593n2Po0KH6LgCX2bNny9atW+W7776T9PR0GTt2rISHh+u5At6g7A8AMIPV7N3p3bEqsA8ZMsT9ftasWfpnUlKSXqznvffe0+/79Onjcdwnn3wigwcP1q9VVn/y5En3Z7m5uTrQ//jjj9K8eXMZOHCg7NixQ7/2BsEfAAA/UAFcTeKryKU+c1EZfmmrV6/2Sd8I/gAAI1Rllb7SrBwbbAj+AAAj8FS/Ekz4AwDAMGT+AAAzqMy9Gif8BTOCPwDACIz5l6DsDwCAYcj8AQBm8NXi/qYEf9dCBJVx6623WukPAAB+wWx/L4P/mDFjKrObfg6xWqIQAADU8ODvcDj83xMAAPwthEr3ARvzLywslMjISEsdAACgOlD2tzDbX5X1FyxYIK1atZIGDRrIoUOH9PbHHntMXn31VW9PBwBASD7VL6SC/1NPPSUrV66UZ555RurUqePe3qNHD3nllVd83T8AABDo4P/GG2/Iyy+/LBMmTNDPEHbp3bu37N+/39f9AwDAR2w+aIaO+X///ffSsWPHcicFFhcX+6pfAAD4Fvf5Vz3z7969u3z66acXbX/nnXekb9++3p4OAAAEe+Y/b948SUpK0hUAle2vW7dOMjIy9HDAxo0b/dNLAACsIvOveuY/evRoef/99+Wjjz6S+vXr64uBb775Rm+78cYbvT0dAADV+1Q/p4Vm8n3+119/vWzatMn3vQEAAMG7yM/OnTt1xu+aB5CQkODLfgEA4FM80tdC8M/NzZXx48fLZ599Jo0aNdLbTp8+LQMGDJDVq1dL69atvT0lAAD+x5h/1cf8p06dqm/pU1n/qVOndFOv1eQ/9RkAAAixzH/r1q2Snp4uXbp0cW9Tr//4xz/quQAAAAQlq5P2nAZP+IuPjy93MR+15n9cXJyv+gUAgE/ZnD+3qrJybI0v+y9cuFBmzJihJ/y5qNczZ86UP/zhD77uHwAAvsGDfbzL/Bs3biw2W0m5o6CgQPr37y+1av18+IULF/TryZMny5gxYypzSgAAEMzB/7nnnvN/TwAA8CfG/L0L/mo5XwAAajRu9bO+yI9SWFgo58+f99gWFRVl5ZQAACDYJvyp8f7p06dLixYt9Nr+aj5A6QYAQFBiwl/Vg/8DDzwgH3/8sSxbtkwiIiLklVdekccff1zf5qee7AcAQFAi+Fe97K+e3qeC/ODBg+Wuu+7SC/t07NhR2rZtK2+++aZMmDDB21MCAIBgzvzVcr4dOnRwj++r98rAgQMlLS3N9z0EAMAXeKRv1YO/CvxZWVn6ddeuXWXt2rXuioDrQT8AAATrCn82C83Y4K9K/V9++aV+/dBDD8nSpUslMjJS7rvvPpkzZ44/+ggAAAIZ/FWQ/93vfqdfDxs2TPbv3y+rVq2SPXv26CV+AQAIStU84S8tLU1GjRqlJ8SrVXI3bNjg2R2nU+bNmyexsbFSt25dHVMPHDhw2fOqpLtdu3Y68Var7X7xxRf+D/5lqYl+48aNk169elk9FQAAIaOgoEB69+6tg3V5nnnmGXnhhRdk+fLl8vnnn+vb50eMGKHX0KnImjVrZNasWTJ//nzZvXu3Pr865ocffvD9bH/VucpyVQUAAAgmarqepaf6iXdGjhypW3lU1q+Wzn/00Udl9OjRepu6k65ly5a6QnDnnXeWe9yiRYtk2rRpegheURcOH3zwgaxYsUIPxfs0+C9evLhSJ1NlDYI/ACCU5eXlebxXa96o5g01cf7YsWO61O8SHR2ty/jbt28vN/irFXV37dolc+fOdW8LCwvT51DHeKNSwd81uz9Y2X86LTZb7UB3A/CLfTNfDHQXAL/JO+uQxpXLL4PmwT7x8fEem1UJPjk52atTqcCvqEy/NPXe9VlZJ0+eFLvdXu4xav5dta3tDwCAaQ/2ycnJ8XiOjbdZfzCwPOEPAACTREVFebSqBP+YmBj98/jx4x7b1XvXZ2U1a9ZMwsPDvTqmIgR/AIAZgmht//bt2+uAvXnzZo+5BGrWf2JiYrnH1KlTRxISEjyOcTgc+n1Fx1SEsj8AwAhWV+mzeXlsfn6+ZGZmesyf27t3rzRp0kTatGkj9957rzz55JPSqVMnfTHw2GOP6TUBxowZ4z5m6NChMnbsWP00XUXd5peUlCT9+vWTa665Rt8xoG4pdM3+ryyCPwAAfrBz504ZMmSI+70K3IoK3itXrtRPyVWB++6775bTp0/rZ+SkpqbqxXtcDh48qCf6udxxxx1y4sQJvTiQmhjYp08ffUzZSYCXY3Oqmw299Omnn8pLL72kO/XOO+9Iq1at5M9//rO+clGdry6qRKJujRgso6UWs/0Rov5+ZG+guwD4d7Z/50Ny5swZj0l0/ogV7Z58SsJKBVZvOQoL5btHH/FrX6uL12P+f/3rX/VqQmopQrWkb1FRkd6u/hhPP/20P/oIAEBIjfnXuOCvxifUikJ/+tOfpHbtkmz7uuuu00sNAgCA4Ob1mH9GRobccMMNF21XJRU1ZgEAQDCq7gl/IZX5q1sTSs9edNm2bZt06NDBV/0CAMA/K/w5LTRTg796oIB6dK+6F1Gt5X/kyBF58803Zfbs2fLb3/7WP70EAMAqxvyrXvZXTw1Siwqoew/PnTunhwDU6kYq+M+YMcPb0wEAgGAP/irbf+SRR2TOnDm6/K8WMejevbs0aNDAPz0EAMAHGPP3wSI/aplBFfQBADDpwT5GBn+1WpHK/ivy8ccfW+0TAAAIpuCvlhIsrbi4WK9VvG/fPr1kIQAAQcli2V9MzvwXL15c7vbk5GQ9/g8AQFCi7O/7R/r++te/lhUrVvjqdAAAwE989lS/7du3ezyJCACAoELmX/XgP27cOI/36qGAR48e1Y8uVM8iBgAgGHGrn4Xgr9bwLy0sLEy6dOkiTzzxhAwfPtzb0wEAgGAO/na7Xe666y7p2bOnNG7c2H+9AgAAwTHhLzw8XGf3PL0PAFDjsLZ/1Wf79+jRQw4dOuTtYQAABMWYv81CMzb4P/nkk/ohPhs3btQT/fLy8jwaAAAIkTF/NaHv/vvvl5tvvlm/v/XWWz2W+VWz/tV7NS8AAICgFELZe7UE/8cff1x+85vfyCeffGLpCwEACAju8/c++KvMXhk0aFBlDwEAADX9Vr9LPc0PAIBgxiI/VQz+nTt3vuwFwKlTp7w5JQAA1YOyf9WCvxr3L7vCHwAACOHgf+edd0qLFi381xsAAPyEsn8Vgj/j/QCAGo2yv/eL/Lhm+wMAAEMyf4fD4d+eAADgT2T+VX+kLwAANRFj/iUI/gAAM5D5V/3BPgAAoGYj8wcAmIHM343gDwAwAmP+JSj7AwBgGII/AMCssr/TQvNCu3bt9AJ5Zds999xT7v4rV668aN/IyEjxB8r+AAAjVHfZ/x//+IfY7Xb3+3379smNN94ov/jFLyo8JioqSjIyMvy+ui7BHwAAP2jevLnH+9///vdyxRVXyKBBgyo8RgX7mJgY8TfK/gAAM/io7J+Xl+fRioqKLvvV58+fl7/85S8yefLkS2bz+fn50rZtW4mPj5fRo0fL119/Lf5A8AcAmMFHwT8+Pl4/3t7VUlJSLvvVGzZskNOnT8ukSZMq3KdLly6yYsUKeffdd/WFglpWf8CAAZKbmyu+RtkfAAAv5OTk6LF5l4iIiMse8+qrr8rIkSMlLi6uwn0SExN1c1GBv1u3bvLSSy/JggULxJcI/gAAI6hiu5Xpc7Z//1SBv3Twv5zDhw/LRx99JOvWrfPq+2rXri19+/aVzMxM8TXK/gAAM1TzrX4ur732mrRo0UJuueUW8Ya6U+Crr76S2NhY8TUyfwCAEQKxwp/D4dDBPykpSWrV8gy5EydOlFatWrnnDDzxxBNy7bXXSseOHfX8gIULF+qqwdSpU8XXCP4AAPiJKvdnZ2frWf5lqe1hYSUF+J9++kmmTZsmx44dk8aNG0tCQoKkp6dL9+7dfd4vgj8AwAwBeLDP8OHDxeks/8AtW7Z4vF+8eLFu1YHgDwAwRwg9nMcKJvwBAGAYMn8AgBF4pG8Jgj8AwAwBGPMPVpT9AQAwDJk/AMAIlP1LEPwBAGag7O9G2R8AAMOQ+QMAjEDZvwTBHwBgBsr+bgR/AIAZCP5ujPkDAGAYMn8AgBEY8y9B8AcAmIGyvxtlfwAADEPmDwAwgs3p1K2qrBwbbAj+AAAzUPZ3o+wPAIBhyPwBAEZgtn8Jgj8AwAyU/d0o+wMAYBgyfwCAESj7lyD4AwDMQNnfjeAPADACmX8JxvwBADAMmT8AwAyU/d0I/gAAY4RS6d4Kyv4AABiGzB8AYAb1YB4rD+dxhk7ZgOAPADACs/1LUPYHAMAwZP4AADMw29+N4A8AMILN8XOrKivHBhvK/gAAGIbMH5U2atJJuf23P0iT5hfk0L/qyouPtpKMvfUC3S3Aa6v/2EI++7CR5GRGSJ1Ih3Tvd06mPHJE4jsWufc5X2iTlx+Pky3vNZbiIpskDD4rM1JypXHzCwHtOyyg7B8cmX9aWpqMGjVK4uLixGazyYYNGwLZHVzCoFt/krvnH5E3F8XIPSM6y6F/RcpTqw5JdNPiQHcN8No/tzfQF7PPbTwgKasPiv2CyMPjr5DCcyX/JC5PbiU7NkXLoy99J39YlymnjteWJ6a0C2i/4ZvZ/jYLzRvJyck6tpVuXbt2veQxb7/9tt4nMjJSevbsKR9++KGEXPAvKCiQ3r17y9KlSwPZDVTCuLtPSuqqJvK/a5pI9oFIeeHB1lL0fzYZMf5UoLsGeO3pVYdk+B2npF2XQrniykK5/7ls+eH7OnLgn3X15wV5YfL3t5rI/0v+XvoMzJdOvf5PZi3Kln/tbCDf7KLaVePv83daaF668sor5ejRo+62bdu2CvdNT0+X8ePHy5QpU2TPnj0yZswY3fbt2ychVfYfOXKkbghutWo7pFOvc7J6SQv3NqfTJns+bSjdE84FtG+ALxTkheufDRvZ9c8D/6wnF4rDpO/1+e592nQqkhatzss3u+pLN/67RyXVqlVLYmJiKrXv888/LzfddJPMmTNHv1+wYIFs2rRJlixZIsuXLxdjJ/wVFRVJXl6eR4P/RTWxS3gtkdMnPK8VfzpZi/FP1HgOh8jy+a3kyqvzpV3XQr3t1A+1pHYdhzSI/vliwKVR82L9Gcwu++eViUMqNlXkwIEDemi7Q4cOMmHCBMnOzq5w3+3bt8uwYcM8to0YMUJv97UaFfxTUlIkOjra3eLj4wPdJQA13JKHW8vh/XVl7rLDge4KqmvCn9NCE9Gxp3QsUrGpPP3795eVK1dKamqqLFu2TLKysuT666+Xs2fPlrv/sWPHpGXLlh7b1Hu13ddq1CXs3LlzZdasWe736oqLCwD/yzsVridENSqT5TdudkF+KlMNAGqSJQ+3ks83Rcmz6zOleVzJ5NUmLS5I8fkwyT8T7pH9nz5RW38Gs+Xk5EhUVJT7fURERLn7lR7W7tWrl74YaNu2raxdu1aP6wdSjcr81R9Y/cFLN/ifGvtUY6B9B5ZcrdpsTj0R6l9MfkINpOZtqcCfnhotz7ydKTFtznt8rua4qLkue7Y1cG9TtwWqSYHdEgoC0GMEU9k/qkwcqij4l9WoUSPp3LmzZGZmlvu5mhtw/Phxj23qfWXnDIRs8EfgrHu5mYz81SkZ9otTEt+xUGb8Plci6znkf1c3CXTXgCqV+j9e10QeWnpY6jZw6HF81dQdLEr9KIe+k+Xl5Fay97MG+i6AZ+9rowM/k/1qsADM9i8tPz9fDh48KLGxsVKexMRE2bx5s8c2NeFPbfe1gNZs1R+i9BWQGg/Zu3evNGnSRNq0aRPIrqGMre81luimdpk455ie5Hfo67ryyIT2cvpk7UB3DfDaxteb6Z9zbuvksf3+xdn6FkDlN8nfS5jNKQumtdOL/PQbfFamp+QGpL+omWbPnq3XslGl/iNHjsj8+fMlPDxc386nTJw4UVq1auWeMzBz5kwZNGiQPPvss3LLLbfI6tWrZefOnfLyyy+HVvBXv9SQIUPc713j+UlJSXqSBILLe6810w2o6f5+ZO9l96kT6ZTpKd/rhtBQ3Y/0zc3N1YH+xx9/lObNm8vAgQNlx44d+rWiZv6HhZUU4AcMGCCrVq2SRx99VB5++GHp1KmTXvyuR48eElLBf/DgweK0WEYBACAYl/ddvXr1JT/fsmXLRdt+8Ytf6OZvjPkDAGAY7tMCABihusv+wYzgDwAwg8P5c6sqK8cGGYI/AMAMPNLXjTF/AAAMQ+YPADCCWsLJ0pi/hA6CPwDADFZX6XOGTt2fsj8AAIYh8wcAGIFb/UoQ/AEAZmC2vxtlfwAADEPmDwAwgs3p1K2qrBwbbAj+AAAzOP7dqsrKsUGGsj8AAIYh8wcAGIGyfwmCPwDADMz2dyP4AwDMwAp/boz5AwBgGDJ/AIARWOGvBMEfAGAGyv5ulP0BADAMmT8AwAg2x8+tqqwcG2wI/gAAM1D2d6PsDwCAYcj8AQBmYJEfN4I/AMAILO9bgrI/AACGIfMHAJiBCX9uBH8AgBlU7LZyu55TQgbBHwBgBMb8SzDmDwCAYcj8AQAG3epnZcxfQgbBHwBgBib8uVH2BwDAMGT+AAAzqJn+NovHhwgyfwCAUbP9bRaaN1JSUuTqq6+Whg0bSosWLWTMmDGSkZFxyWNWrlwpNpvNo0VGRoqvEfwBAPCDrVu3yj333CM7duyQTZs2SXFxsQwfPlwKCgoueVxUVJQcPXrU3Q4fPuzzvlH2BwCYoZon/KWmpl6U1asKwK5du+SGG26o8DiV7cfExIg/kfkDAMwK/k4LzYIzZ87on02aNLnkfvn5+dK2bVuJj4+X0aNHy9dffy2+RvAHAMALeXl5Hq2oqOiyxzgcDrn33nvluuuukx49elS4X5cuXWTFihXy7rvvyl/+8hd93IABAyQ3N1d8ieAPADCDjzL/+Ph4iY6Odjc1se9y1Nj/vn37ZPXq1ZfcLzExUSZOnCh9+vSRQYMGybp166R58+by0ksviS8x5g8AMIOPbvXLycnRk/JcIiIiLnnY9OnTZePGjZKWliatW7f26itr164tffv2lczMTPElgj8AwAi+erBPVFSUR/CviNPplBkzZsj69etly5Yt0r59e6+/0263y1dffSU333yz+BLBHwAAP1Cl/lWrVunxe3Wv/7Fjx/R2NVRQt25d/VqV+Fu1auUeOnjiiSfk2muvlY4dO8rp06dl4cKF+la/qVOn+rRvBH8AgBmq+Va/ZcuW6Z+DBw/22P7aa6/JpEmT9Ovs7GwJCyuZfvfTTz/JtGnT9IVC48aNJSEhQdLT06V79+7iSwR/AIAZHE5VuxdLx3tBlf0vRw0HlLZ48WLd/I3Z/gAAGIbMHwBgBh7p60bwBwAYwuoqfU4JFZT9AQAwDJk/AMAMlP3dCP4AADPo2frVN9s/mFH2BwDAMGT+AAAzOB0/t6qycmyQIfgDAMzAmL8bwR8AYAbG/N0Y8wcAwDBk/gAAM1D2dyP4AwDMoKv+VoK/hAzK/gAAGIbMHwBgBsr+bgR/AIAZHOo+fQv36uvjQwNlfwAADEPmDwAwA2V/N4I/AMAMBH83yv4AABiGzB8AYAaW93Uj+AMAjOB0OnSrKivHBhuCPwDADGrM3kr27gydzJ8xfwAADEPmDwAwg87cyfwVgj8AwAxqhT6bhXH7EBrzp+wPAIBhyPwBAGag7O9G8AcAGMHpcIjTQtnfSdkfAADUVGT+AAAzUPZ3I/gDAMygFvixEfwVyv4AABiGzB8AYAZdtrdynz9lfwAAahSnwylOC2V/J8EfAIAaRt+qxwp/CmP+AAD40dKlS6Vdu3YSGRkp/fv3ly+++OKS+7/99tvStWtXvX/Pnj3lww8/9HmfCP4AAHPK/habt9asWSOzZs2S+fPny+7du6V3794yYsQI+eGHH8rdPz09XcaPHy9TpkyRPXv2yJgxY3Tbt2+f+BLBHwBgTtnfavPSokWLZNq0aXLXXXdJ9+7dZfny5VKvXj1ZsWJFufs///zzctNNN8mcOXOkW7dusmDBArnqqqtkyZIl4ks1eszfNfnighRbWrcBCGZ5Z0NnSVGgrLx8R7VNprMaKy6o41Wf8/I8tkdEROhW1vnz52XXrl0yd+5c97awsDAZNmyYbN++vdzvUNtVpaA0VSnYsGGD+FKNDv5nz57VP7eJ78dDgGDRuHOgewBUz7/n0dHRfjl3nTp1JCYmRrYdsx4rGjRoIPHx8R7bVEk/OTn5on1PnjwpdrtdWrZs6bFdvd+/f3+55z927Fi5+6vtvlSjg39cXJzk5ORIw4YNxWazBbo7RlBXvOo/fPV3j4qKCnR3AJ/iv+/qpzJ+FfjVv+f+oibOZWVl6UzcF/21lYk35WX9wa5GB39VPmndunWgu2Ek9Q8j/zgiVPHfd/XyV8Zf9gJAterUrFkzCQ8Pl+PHj3tsV+9VJaI8ars3+1cVE/4AAPDTcENCQoJs3rzZvc3hcOj3iYmJ5R6jtpfeX9m0aVOF+xuZ+QMAEMxmzZolSUlJ0q9fP7nmmmvkueeek4KCAj37X5k4caK0atVKUlJS9PuZM2fKoEGD5Nlnn5VbbrlFVq9eLTt37pSXX37Zp/0i+MMramxLTW6piWNcwOXw3zd87Y477pATJ07IvHnz9KS9Pn36SGpqqntSX3Z2th7CdhkwYICsWrVKHn30UXn44YelU6dOeqZ/jx49fNovmzOUFisGAACXxZg/AACGIfgDAGAYgj8AAIYh+AMAYBiCP/z2WEqgpkhLS5NRo0bpVebU6m2+XkcdCDYEf/jlsZRATaLuu1b/TasLXMAE3OqHSlGZ/tVXX+1+rKRapUqtgT5jxgx56KGHAt09wGdU5r9+/Xr9DHUgVJH547Jcj6VUj6Gs7GMpAQDBi+CPy7rUYyl9/ZhJAID/EfwBADAMwR9+eSwlACB4Efzhl8dSAgCCF0/1g08eSwnUZPn5+ZKZmel+n5WVJXv37pUmTZpImzZtAto3wB+41Q+Vpm7zW7hwofuxlC+88IK+BRCo6bZs2SJDhgy5aLu64F25cmVA+gT4E8EfAADDMOYPAIBhCP4AABiG4A8AgGEI/gAAGIbgDwCAYQj+AAAYhuAPAIBhCP6ARZMmTfJ49vvgwYPl3nvvDchCNepZ9KdPn65wH/X5hg0bKn3O5ORkvaCTFd99953+XrViHoDgQPBHyAZkFXBUU88m6NixozzxxBNy4cIFv3/3unXrZMGCBT4L2ADga6ztj5B10003yWuvvSZFRUXy4Ycfyj333CO1a9eWuXPnXrTv+fPn9UWCL6j14AEgmJH5I2RFREToRw63bdtWfvvb38qwYcPkvffe8yjVP/XUUxIXFyddunTR23NycuSXv/ylNGrUSAfx0aNH67K1i91u1w85Up83bdpUHnjgASm7QnbZsr+6+HjwwQclPj5e90lVIV599VV9Xtd68o0bN9YVANUv11MTU1JSpH379lK3bl3p3bu3vPPOOx7foy5oOnfurD9X5yndz8pS/VLnqFevnnTo0EEee+wxKS4uvmi/l156Sfdf7af+PmfOnPH4/JVXXpFu3bpJZGSkdO3aVV588UWv+wKg+hD8YQwVJFWG76IeSZyRkSGbNm2SjRs36qA3YsQIadiwoXz66afy2WefSYMGDXQFwXXcs88+qx/0smLFCtm2bZucOnVK1q9ff8nvnThxorz11lv6QUjffPONDqTqvCqY/vWvf9X7qH4cPXpUnn/+ef1eBf433nhDli9fLl9//bXcd9998utf/1q2bt3qvkgZN26cjBo1So+lT506VR566CGv/ybqd1W/z7/+9S/93X/6059k8eLFHvuop92tXbtW3n//fUlNTZU9e/bIf//3f7s/f/PNN2XevHn6Qkr9fk8//bS+iHj99de97g+AaqIe7AOEmqSkJOfo0aP1a4fD4dy0aZMzIiLCOXv2bPfnLVu2dBYVFbmP+fOf/+zs0qWL3t9FfV63bl3n3//+d/0+NjbW+cwzz7g/Ly4udrZu3dr9XcqgQYOcM2fO1K8zMjJUWUB/f3k++eQT/flPP/3k3lZYWOisV6+eMz093WPfKVOmOMePH69fz50719m9e3ePzx988MGLzlWW+nz9+vUVfr5w4UJnQkKC+/38+fOd4eHhztzcXPe2v/3tb86wsDDn0aNH9fsrrrjCuWrVKo/zLFiwwJmYmKhfZ2Vl6e/ds2dPhd8LoHox5o+QpbJ5lWGrjF6V0X/1q1/p2esuPXv29Bjn//LLL3WWq7Lh0goLC+XgwYO61K2y89KPMa5Vq5b069fvotK/i8rKw8PDZdCgQZXut+rDuXPn5MYbb/TYrqoPffv21a9Vhl32ccqJiYnirTVr1uiKhPr91DPt1YTIqKgoj33U8+xbtWrl8T3q76mqFepvpY6dMmWKTJs2zb2POk90dLTX/QFQPQj+CFlqHHzZsmU6wKtxfRWoS6tfv77HexX8EhISdBm7rObNm1d5qMFbqh/KBx984BF0FTVnwFe2b98uEyZMkMcff1wPd6hgvXr1aj204W1f1XBB2YsRddEDIDgR/BGyVHBXk+sq66qrrtKZcIsWLS7Kfl1iY2Pl888/lxtuuMGd4e7atUsfWx5VXVBZshqrVxMOy3JVHtREQpfu3bvrIJ+dnV1hxUBNrnNNXnTZsWOHeCM9PV1PhnzkkUfc2w4fPnzRfqofR44c0RdQru8JCwvTkyRbtmyptx86dEhfSACoGZjwB/ybCl7NmjXTM/zVhL+srCx9H/7vfvc7yc3N1fvMnDlTfv/73+uFcvbv368nvl3qHv127dpJUlKSTJ48WR/jOqeaQKeo4Ktm+ashihMnTuhMWpXSZ8+erSf5qUlzqqy+e/du+eMf/+ieRPeb3/xGDhw4IHPmzNHl91WrVumJe97o1KmTDuwq21ffocr/5U1eVDP41e+ghkXU30X9PdSMf3UnhaIqB2qCojr+22+/la+++krfYrlo0SKv+gOg+hD8gX9Tt7GlpaXpMW41k15l12osW435uyoB999/v/zXf/2XDoZq7FsF6rFjx17yvGro4fbbb9cXCuo2ODU2XlBQoD9TZX0VPNVMfZVFT58+XW9XiwSpGfMqqKp+qDsO1DCAuvVPUX1UdwqoCwp1G6C6K0DNsvfGrbfeqi8w1HeqVfxUJUB9Z1mqeqL+HjfffLMMHz5cevXq5XErn7rTQN3qpwK+qnSoaoW6EHH1FUDwsalZf4HuBAAAqD5k/gAAGIbgDwCAYQj+AAAYhuAPAIBhCP4AABiG4A8AgGEI/gAAGIbgDwCAYQj+AAAYhuAPAIBhCP4AABiG4A8AgJjl/wMBe+C2b6qqXQAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from phoenix.client.experiments import run_experiment\n",
    "from scripts.develop_judge import (\n",
    "    accuracy,\n",
    "    compute_metrics,\n",
    "    create_task_function,\n",
    "    eval_fn,\n",
    "    eval_fp,\n",
    "    eval_tn,\n",
    "    eval_tp,\n",
    "    retrieve_results,\n",
    ")\n",
    "\n",
    "evaluator_task = create_task_function(eval_prompt, model=\"gpt-4.1-nano\")\n",
    "print(\"Running evaluator experiment...\")\n",
    "experiment = run_experiment(\n",
    "    dataset=dev_dataset,\n",
    "    task=evaluator_task,\n",
    "    evaluators=[eval_tp, eval_tn, eval_fp, eval_fn, accuracy],\n",
    ")\n",
    "print(f\"Experiment completed! Experiment: {experiment}\")\n",
    "\n",
    "# View experiment results\n",
    "# Note: retrieve_results may need to be updated for the new client API\n",
    "# results = retrieve_results(experiment)\n",
    "# compute_metrics(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see we have just 55.6% balanced accuracy and a really poor true negative rate. True negative rate of 0.111 means that out of all negative examples, we only correctly identified 11.1% of them. We can see this in the confusion matrix, as our evaluator only identified 1 out of 9 negative examples correctly. \n",
    "\n",
    "Low true negative rate is obviously a bad sign, conceptually. It means that our evaluator cannot identify negative outputs, meaning its giving PASS to almost all data points. \n",
    "\n",
    "Example: If our model outputs a recipe with lots of sugar to a someone diabetic, but our evaluator says pass, this would put our user in danger.\n",
    "\n",
    "Let's take a closer look at our failure cases to see how we can improve our evaluator prompt to handle negative cases better.\n",
    "\n",
    "You can do this very easily in the Arize UI. \n",
    "- Click on your dev_set datase\n",
    "- Click into your experiment. \n",
    "- Copy this text into the filtering bar: ```evals[\"accuracy\"].score == 0```\n",
    "\n",
    "See the image below. \n",
    "\n",
    "![Alt text](https://storage.googleapis.com/arize-phoenix-assets/assets/images/filter_screenshot_hw3_ai_evals)\n",
    "\n",
    "I'm going to paste some of the examples that are most telling. \n",
    "\n",
    "### Example Failure Cases\n",
    "\n",
    "---\n",
    "\n",
    "ground_truth_explanation\": \"The recipe attempts to modify a traditional fruit cake to be more diabetic-friendly by using natural sweeteners like erythritol or monk fruit sweetener instead of sugar, and unsweetened dried fruits. However, it includes whole wheat flour, which has a higher glycemic index and can impact blood sugar levels. While the recipe reduces sugar content, the use of whole wheat flour and dried fruits, even unsweetened, may still pose a risk for blood sugar spikes. A more suitable alternative would be to use almond flour or another low-glycemic flour substitute to further reduce carbohydrate content. Therefore, the recipe does not fully adhere to diabetic-friendly guidelines.\",\n",
    "\n",
    "**Ground Truth Label:** FAIL\n",
    "\n",
    "**Predicted Label:** PASS\n",
    "\n",
    "**Predicted Explanation:** The recipe provides a diabetic-friendly modification for traditional fruit cake by reducing sugar, using natural sweeteners like erythritol or monk fruit, incorporating unsweetened dried fruits, and adding fiber-rich whole wheat flour. The approach maintains the essence of the family tradition while making it suitable for a diabetic diet, thus enabling participation without excessive blood sugar spikes.\n",
    "\n",
    "**Ground Truth Explanation:** The recipe attempts to modify a traditional fruit cake to be more suitable for someone with diabetes by reducing sugar content and using sugar substitutes. However, there are several concerns that make it not fully diabetic-friendly. **Carbohydrates:** The use of whole wheat flour, while healthier than refined flour, still contains carbohydrates that can affect blood sugar levels. **Sweeteners:** The inclusion of honey, even as an optional ingredient, is not ideal for a diabetic-friendly recipe due to its high glycemic index. **Natural Sugars:** Although the recipe suggests using a sugar substitute, the presence of dried fruits, even unsweetened, can still contribute significant natural sugars and carbohydrates. **Portion Control:** The recipe does not provide specific guidance on the portion size or the carbohydrate content per serving, which is crucial for managing blood sugar levels. Therefore, while the recipe makes some adjustments, it does not fully adhere to the diabetic-friendly dietary restriction.\n",
    "\n",
    "---\n",
    "\n",
    "**Ground Truth Label:** FAIL\n",
    "\n",
    "**Predicted Label:** PASS\n",
    "\n",
    "**Predicted Explanation:** The modified recipe reduces added sugar by replacing it with natural sweeteners like honey or sugar substitutes and suggests selecting dried fruits with lower sugar content. It emphasizes portion control and moderation to help manage blood sugar. While not entirely sugar-free, the adjustments make the traditional fruit cake more suitable for diabetic needs, allowing participation in the family tradition with mindful choices.\n",
    "\n",
    "**Ground Truth Explanation:** The recipe for Honey Garlic Chicken with Roasted Vegetables includes honey as an ingredient, which is not compliant with Whole30 guidelines. **Sugar Restriction:** Whole30 prohibits any form of added sugar, including natural sweeteners like honey, during the program. **Ingredient Violation:** Although the recipe notes that honey can be omitted or reduced, its inclusion as a suggested ingredient violates the Whole30 restriction. Therefore, the recipe does not fully adhere to the Whole30 dietary restriction.\n",
    "\n",
    "---\n",
    "\n",
    "**Ground Truth Label:** FAIL\n",
    "\n",
    "**Predicted Label:** PASS\n",
    "\n",
    "**Predicted Explanation:** The recipe provided is Whole30 compliant, using approved ingredients such as chicken, coconut aminos (a Whole30 substitute for soy sauce), apple cider vinegar, garlic, vegetables, and olive oil. The only consideration is honey, which is technically not compliant during a strict Whole30. However, for a dinner that can be enjoyed by both Whole30 and non-Whole30 eaters, using honey is acceptable in moderation or can be omitted. Overall, the recipe aligns well with Whole30 standards and is designed to appeal to someone who is not on Whole30, making it a suitable suggestion.\n",
    "\n",
    "**Ground Truth Explanation:** The recipe for Honey Garlic Chicken with Roasted Vegetables includes honey as an ingredient, which is not compliant with Whole30 guidelines. **Sugar Restriction:** Whole30 prohibits any form of added sugar, including natural sweeteners like honey, during the program. **Ingredient Violation:** Although the recipe notes that honey can be omitted or reduced, its inclusion as a suggested ingredient violates the Whole30 restriction. Therefore, the recipe does not fully adhere to the Whole30 dietary restriction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Takeaways from initial experiment\n",
    "\n",
    "There are a few takeaways from here.\n",
    "\n",
    "The first failure case tells us that the evaluator thinks whole wheat flour and honey are diabetic friendly, but in reality they are not due to high glycemic index and high carbohydrate levels. We should probably include this in our evaluator prompt so it is aware. \n",
    "\n",
    "The second failure case tells us that the evaluator is not fully knowledgeable about the Whole30 diet. We should probably give our evaluator information on this.\n",
    "\n",
    "Let's update our evaluator accordingly. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_prompt = \"\"\"\n",
    "Query: {attributes.query}\n",
    "Dietary Restriction: {attributes.dietary_restriction}\n",
    "Model Output: {attributes.output.value}\n",
    "\n",
    "Note that flour and honey are not diabetic-friendly due to their high glycemic index and carbohydrate content.\n",
    "Whole30: No grains, dairy, legumes, sugar (no honey), alcohol, or processed foods. Nothing with high carbohydrate content or high glycemic index.\n",
    "Examples of foods that are not Whole30 compliant: honey, whole wheat flour, unsweetened dried fruits.\n",
    "\n",
    "Return your answer in the following JSON format:\n",
    "\n",
    "\"label\": \"PASS\" or \"FAIL\"\n",
    "\"explanation\": \"Explanation for your answer\"\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Second iteration\n",
    "\n",
    "Let's experiment on the dev set again, with our new eval prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running evaluator experiment...\n",
      "🧪 Experiment started.\n",
      "📺 View dataset experiments: http://127.0.0.1:6006/datasets/RGF0YXNldDoz/experiments\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDoz/compare?experimentId=RXhwZXJpbWVudDoxMA==\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running tasks |█████████▍| 31/33 (93.9%) | ⏳ 00:42<00:02 |  1.39s/it"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Task runs completed.\n",
      "🧠 Evaluation started.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running tasks |██████████| 33/33 (100.0%) | ⏳ 00:43<00:00 |  1.31s/it\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDoz/compare?experimentId=RXhwZXJpbWVudDoxMA==\n",
      "\n",
      "Experiment Summary (08/04/25 11:42 PM -0700)\n",
      "--------------------------------------------\n",
      "  evaluator   n  n_scores  avg_score  n_labels               top_2_labels\n",
      "0  accuracy  33        33   0.848485        33   {'True': 28, 'False': 5}\n",
      "1   eval_fn  33        33   0.030303        33   {'False': 32, 'True': 1}\n",
      "2   eval_fp  33        33   0.121212        33   {'False': 29, 'True': 4}\n",
      "3   eval_tn  33        33   0.181818        33   {'False': 27, 'True': 6}\n",
      "4   eval_tp  33        33   0.666667        33  {'True': 22, 'False': 11}\n",
      "\n",
      "Tasks Summary (08/04/25 11:42 PM -0700)\n",
      "---------------------------------------\n",
      "   n_examples  n_runs  n_errors\n",
      "0          33      33         0\n",
      "Experiment completed! Experiment ID: RXhwZXJpbWVudDoxMA==\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "<span style=\"font-weight: bold\">Judge Performance on Dev Set:</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n",
       "\u001b[1mJudge Performance on Dev Set:\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Positive Rate <span style=\"font-weight: bold\">(</span>TPR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.957</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Positive Rate \u001b[1m(\u001b[0mTPR\u001b[1m)\u001b[0m: \u001b[1;36m0.957\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Negative Rate <span style=\"font-weight: bold\">(</span>TNR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.600</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Negative Rate \u001b[1m(\u001b[0mTNR\u001b[1m)\u001b[0m: \u001b[1;36m0.600\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Balanced Accuracy: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.778</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Balanced Accuracy: \u001b[1;36m0.778\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running experiment evaluations |██████████| 165/165 (100.0%) | ⏳ 00:19<00:00 |  8.62it/s\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAf8AAAGwCAYAAACn/2wHAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMW9JREFUeJzt3Ql4lNX5+P17EkjClgCyJJGwiWyy2agYRZYfCGJfZLFu1bIIWC0oioDgwiLW+IoKKghWgaiVghtRUbGALFJAmiBWVCiBAEEIm0BIEAIz87/OsTNhIAmZPDOZ5Xw/XufKPOucIBf3c+6zPDan0+kUAABgjIhAVwAAAFQsgj8AAIYh+AMAYBiCPwAAhiH4AwBgGII/AACGIfgDAGCYShLCHA6H7Nu3T2rUqCE2my3Q1QEAeEktNXPixAlJTEyUiAj/tUdPnTolhYWFlu8TFRUlMTExEupCOvirwJ+UlBToagAALMrJyZEGDRr4LfA3aVRdcg/aLd8rPj5esrOzQ/4BIKSDv2rxK1f1eFwqVQrt/xFASaruKwh0FQC/OWs/LWt+mOH+99wfVItfBf7dmY0ltkb5swt5JxzSKHmXvh/BP4BcqX4V+CtVDu3/EUBJKkWeDXQVAL+riK7b6jVsupSXQ8Knezmkgz8AAGVldzrE7rR2fbgg+AMAjOAQpy7lZeXaYMNUPwAADEPLHwBgBIf+z9r14YLgDwAwgt3p1KW8rFwbbEj7AwBgGFr+AAAjMOCvCMEfAGAEFbztBH+NtD8AAIah5Q8AMAJp/yIEfwCAERjtX4S0PwAAhqHlDwAwglqix9oiP+GD4A8AMILd4mh/O33+AACEFvVGP2tv9ZOwQZ8/AACGoeUPADACff5FCP4AACM4xCZ2sVm6PlyQ9gcAwDC0/AEARnA4fyvlZeXaYEPwBwAYwW4x7W8n7Q8AAEIVLX8AgBFo+Rch+AMAjOBw2nQpLyvXBhvS/gAAGIaWPwDACKT9ixD8AQBGsEuELuVll/BB8AcAGMFpsc/fSZ8/AAAIVQR/AIBRff52C8UbqampcvXVV0uNGjWkXr160q9fP9m2bZvHOadOnZIRI0bIJZdcItWrV5dbb71VDhw4UOp9nU6nTJw4URISEqRKlSrSo0cP2b59u1d1I/gDAIxgd0ZYLt5YvXq1DuwbNmyQZcuWyZkzZ6Rnz55SUFDgPueRRx6RTz/9VN5//319/r59+2TAgAGl3vf555+XV155RebMmSPffPONVKtWTXr16qUfJMqKPn8AALyQl5fnsR0dHa3L+ZYuXeqxnZaWpjMAmZmZ0rlzZzl+/LjMnTtXFixYIP/3f/+nz5k/f760atVKPzBce+21xbb6Z8yYIU8++aT07dtX73v77belfv36kp6eLnfeeWeZfgda/gAAI6hX8jokwkL5Le2flJQkcXFx7qLS+2Whgr1Su3Zt/VM9BKhsgErbu7Rs2VIaNmwo69evL/Ye2dnZkpub63GNqkPHjh1LvKY4tPwBAEbw1Tz/nJwciY2Nde8vrtV/PofDIQ8//LBcf/310qZNG71PBfGoqCipWbOmx7mqFa+OFce1X51T1muKQ/AHAMALKvCfG/zLQvX9b9myRdauXSvBgLQ/AMAIFT3gz2XkyJGyZMkSWblypTRo0MC9Pz4+XgoLC+XYsWNyLjXaXx0rjmv/+TMCSrumOAR/AIBBff7WijfU4DwV+BcvXixfffWVNGnSxON4cnKyVK5cWVasWOHep6YC7tmzR1JSUoq9p7qHCvLnXqMGIKpR/yVdUxyCPwAAfqBS/X//+9/1aH4111/1yavy66+/ugfqDR06VEaPHq2zAmoA4JAhQ3QQP3ekvxoEqB4gFJvNpscOPPPMM/LJJ5/I999/LwMHDpTExES9jkBZ0ecPADCCw+La/g5xenX+7Nmz9c+uXbt67FfT+QYPHqw/T58+XSIiIvTiPqdPn9bz9V977TWP81U2wDVTQBk3bpxeK+C+++7TXQadOnXS0wpjYmLKXDebU+UlQpRKdagnp2tveloqVS77Lw2Ekqp78wNdBcBvztpPy1f/+f91cPN2EJ23sWLh5tZStUZkue9z8oRd7uzwo1/rWlFo+QMAjOCar19RLf9gRp8/AACGoeUPADCC3WnTpbysXBtsCP4AACPYLQ74s5P2BwAAoYqWPwDACA5nhC7l5QjdyXEXIPgDAIxA2r8IaX8AAAxDyx8AYASHxRH7DgkfBH8AgBGsL/ITIeEifH4TAABQJrT8AQBGsDsjdCkvK9cGG4I/AMAIDrHpUl5Wrg02BH8AgBFo+RcJn98EAACUCS1/AIARrC/yEyHhguAPADCCw2nTpbysXBtswucxBgAAlAktfwCAEdQiPVZS944wai8T/AEARrD+Vr8ICRfh85sAAIAyoeUPADCCXWy6lJeVa4MNwR8AYATS/kXC5zcBAABlQssfAGAEu8XUvV3CB8EfAGAE0v5FCP4AACPwYp8i4fObAACAMqHlDwAwglNs4rDQ5+9kqh8AAKGFtH+R8PlNAABAmdDyBwAYgVf6FiH4AwCMYLf4Vj97GCXLw+c3AQAgiKxZs0b69OkjiYmJYrPZJD093eO42ldcmTZtWon3nDx58gXnt2zZ0uu60fIHABihotP+BQUF0r59e7n33ntlwIABFxzfv3+/x/YXX3whQ4cOlVtvvbXU+15xxRWyfPly93alSt6HcoI/AMAIDonQpbxc1+bl5Xnsj46O1uV8vXv31qUk8fHxHtsff/yxdOvWTZo2bVpqPVSwP/9ab5H2BwDAC0lJSRIXF+cuqampYtWBAwfks88+0y3/i9m+fbvuSlAPCXfffbfs2bPH6++j5Q8AMILdadOlvFzX5uTkSGxsrHt/ca1+b7311ltSo0aNYrsHztWxY0dJS0uTFi1a6G6DKVOmyA033CBbtmzR15cVwR8AYARf9fnHxsZ6BH9fmDdvnm7Fx8TElHreud0I7dq10w8DjRo1kvfee69MWQMXgj8AwAhOi2/1c/pphb+vv/5atm3bJosWLfL62po1a0rz5s0lKyvLq+vo8wcAIIDmzp0rycnJemaAt/Lz82XHjh2SkJDg1XUEfwCAEexis1y8DcybN2/WRcnOztafzx2gp2YOvP/++zJs2LBi79G9e3eZOXOme3vMmDGyevVq2bVrl6xbt0769+8vkZGRctddd3lVN9L+AAAjOJzWluh1OL07PyMjQ0/dcxk9erT+OWjQID1oT1m4cKE4nc4Sg7dq1R8+fNi9vXfvXn3ukSNHpG7dutKpUyfZsGGD/uwNgj8AAH7QtWtXHdhLc9999+lSEtXCP5d6WPAFgj/KpE7NAvnzgI3SsU2OxESdlZ8PxcpzaV1k227vnjaBUHDbbT/KvUO+k/T05vL635IDXR34iMPigD9HGL3Sl+CPi6pe9bTMHPeJbN6WKONeuUmOnYiRBvXz5MRJ63NbgWDT/PIjcnPvLNm5s2agqwIfc4hNl/Kycm2wCYrHmFmzZknjxo31/EY1Z3Hjxo2BrhLO8cde38mho9Xkube6yNZd9ST3SKxk/NhA9h3y7TxXINBiYs7I2HHr5eVXrpH8/KhAVwcI3+Cv5jWqQRCTJk2STZs26akOvXr1koMHDwa6avif69vvlq2768qUPy+X9BfekTef/Ej+v05bA10twOdG/CVD/r0xUTZvtrZuOoJ7hT+7hRIuAh78X3rpJRk+fLgMGTJEWrduLXPmzJGqVavq1Y4QHBLqnpC+XX6SvQfiZOzLveXj1a3koTvXSa+U/wa6aoDPdOm8Wy5rdlTmp3k/1xqh1efvsFDCRUD7/AsLCyUzM1MmTJjg3hcRESE9evSQ9evXX3D+6dOndXE5/81K8I8Im1O27a4jb6Rfrbe359SRJolHpW/nn+TL9c0DXT3Asjp1CuTPf86Ux5/oJmfORAa6OkB4B381d9Fut0v9+vU99qvtrVsvTCurNyeplxigYh05XlV27avlsW93bk3p/LvsgNUJ8KXLLz8qtWqdlpmvfuneFxnplDZtDkqfPtvllr63i8MRPq0+owf8WZnnL+GT9g+p0f4qQ+BaJMHV8levVoR/bcmqLw3jj3nsa1D/uBz4pXrA6gT40ubN9eX+Bzzfuz76kW8kZ2+svP9+KwJ/mHBaHO3vJPj7Rp06dfSyhOo9xudS2/HxFw64Ua9N9MWrE+Gd95e3lVnjP5Z7en8rKzOaSqsmh6TPDVvlhXduCHTVAJ/49dfKsnu359S+U6cqyYm8qAv2I3T56q1+4SCgj7NRUVH6ZQYrVqxw73M4HHo7JSUlkFXDOdRI/ydfu1G6X7ND5k/+UAb+/luZuShFlm9sFuiqAQBCMe2v0vhqneOrrrpKrrnmGpkxY4YUFBTo0f8IHuu/b6QLYIrHxncPdBXgY6zwF0TB/4477pBDhw7JxIkTJTc3Vzp06CBLly69YBAgAABWkPYPouCvjBw5UhcAAGBI8AcAwN9Y278IwR8AYATS/kXCZ/QCAAAoE1r+AAAj0PIvQvAHABiB4F+EtD8AAIah5Q8AMAIt/yIEfwCAEZwWp+s5JXwQ/AEARqDlX4Q+fwAADEPLHwBgBFr+RQj+AAAjEPyLkPYHAMAwtPwBAEag5V+E4A8AMILTadOlvKxcG2xI+wMAYBha/gAAI6gFfqws8uOwcG2wIfgDAIxAn38R0v4AABiGlj8AwAgM+CtCyx8AYFTa32GheGPNmjXSp08fSUxMFJvNJunp6R7HBw8erPefW2666aaL3nfWrFnSuHFjiYmJkY4dO8rGjRu9/rMg+AMAjGr5Oy0UbxQUFEj79u11sC6JCvb79+93l3/84x+l3nPRokUyevRomTRpkmzatEnfv1evXnLw4EGv6kbaHwAAL+Tl5XlsR0dH63K+3r1761IadV18fHyZv/ull16S4cOHy5AhQ/T2nDlz5LPPPpN58+bJ+PHjy3wfWv4AACM4Lab8nf9r+SclJUlcXJy7pKamlrtOq1atknr16kmLFi3kgQcekCNHjpR4bmFhoWRmZkqPHj3c+yIiIvT2+vXrvfpeWv4AACM49QOAteuVnJwciY2Nde8vrtVfFirlP2DAAGnSpIns2LFDHn/8cZ0pUIE8MjLygvMPHz4sdrtd6tev77FfbW/dutWr7yb4AwDgBRX4zw3+5XXnnXe6P7dt21batWsnl112mc4GdO/eXfyJtD8AwKgV/hwWij81bdpU6tSpI1lZWcUeV8dURuDAgQMe+9W2N+MGFII/AMAIFT3a31t79+7Vff4JCQnFHo+KipLk5GRZsWKFe5/D4dDbKSkpXn0XwR8AAD/Iz8+XzZs366JkZ2frz3v27NHHxo4dKxs2bJBdu3bpAN63b19p1qyZnrrnotL/M2fOdG+raX5vvPGGvPXWW/LTTz/pQYJqSqFr9H9Z0ecPADCCGrFvq8C1/TMyMqRbt24egVsZNGiQzJ49W/7zn//oIH7s2DG9EFDPnj1l6tSpHgMI1UBANdDP5Y477pBDhw7JxIkTJTc3Vzp06CBLly69YBDgxRD8AQBGUCP9LY32d3p3fteuXcVZykVffvnlRe+hsgLnGzlypC5WkPYHAMAwtPwBAEbgxT5FCP4AACMQ/IsQ/AEARqjoAX/BjD5/AAAMQ8sfAGCEih7tH8wI/gAAg4K/lT5/CRuk/QEAMAwtfwCAERjtX4TgDwAwgsraW8ncOyV8kPYHAMAwtPwBAEYg7V+E4A8AMAN5fzeCPwDADBZb/hJGLX/6/AEAMAwtfwCAEVjhrwjBHwBgBAb8FSHtDwCAYWj5AwDMoFruDPjTCP4AACPQ51+EtD8AAIah5Q8AMAOL/LgR/AEARmC0v5fB/5NPPpGyuuWWW8p8LgAACNLg369fvzLdzGazid1ut1onAAD8I4xS934P/g6Hw9KXAAAQaKT9fTTa/9SpU1YuBwCg4gf8OS0UU4O/SutPnTpVLr30Uqlevbrs3LlT73/qqadk7ty5/qgjAAAIZPD/61//KmlpafL8889LVFSUe3+bNm3kzTff9GXdAADwIZsPiqHB/+2335a//e1vcvfdd0tkZKR7f/v27WXr1q2+rh8AAL5B2r/8wf/nn3+WZs2aFTso8MyZM97eDgAABHvwb926tXz99dcX7P/ggw/kyiuv9FW9AADwLVr+5V/hb+LEiTJo0CCdAVCt/Y8++ki2bdumuwOWLFni7e0AAKgYvNWv/C3/vn37yqeffirLly+XatWq6YeBn376Se+78cYbvb0dAABhac2aNdKnTx9JTEzUi+Clp6e7j6lu8scee0zatm2rY6k6Z+DAgbJv375S7zl58mR9r3NLy5YtK2Zt/xtuuEGWLVtWnksBADDilb4FBQV6MPy9994rAwYM8Dh28uRJ2bRpk54mr845evSojBo1Si+Rn5GRUep9r7jiCt0Ad6lUqVLFvdhHVU61+F3jAJKTk8t7KwAAQuatfnl5eR67o6OjdTlf7969dSlOXFzcBY3omTNnyjXXXCN79uyRhg0bllgNFezj4+OlQtP+e/fu1S1/VUH1lKLK1VdfLZ06ddLHAAAIZ0lJSTp4u0pqaqpP7nv8+HGdxq9Zs2ap523fvl13EzRt2lRPu1cPC34P/sOGDdN9FarV/8svv+iiPqvBf+oYAABBPeDPaaGISE5Ojg7UrjJhwgTLVVPL5asxAHfddZfExsaWeF7Hjh31QntLly6V2bNnS3Z2tm6Qnzhxwr9p/9WrV8u6deukRYsW7n3q86uvvqorAABAMLI5fyvl5bpWBefSArS3VIP69ttvF6fTqQN6ac7tRmjXrp1+GGjUqJG89957MnToUP8Ff5XuKG4xH7Xmv0pDAAAQzn3+vuQK/Lt375avvvrK64cK1UXQvHlzycrK8m/af9q0afLggw96jEZUn1Xf/wsvvODt7QAAMNKZ/wV+1YevRu9fcsklXt8jPz9fduzYIQkJCV5dV6aWf61atfQghHOnL6hUg2t6wdmzZ/VnNZ2hX79+3tYdAICwW+QnPz/fo0Wu+uc3b94stWvX1sH6D3/4g57upxbIU9nz3NxcfZ467npxXvfu3aV///4ycuRIvT1mzBi9doBK9as1ASZNmqTfs6PGCvg8+M+YMcOrmwIAYHraPyMjQ7p16+beHj16tP6pVslVi/V88sknertDhw4e161cuVK6du2qP6tW/eHDh93H1Kw6FeiPHDkidevW1TPtNmzYoD/7PPirigIAgLJTAVwN4itJacdcdu3a5bG9cOFC8YVyL/LjmppQWFjosc+XIyABAAjnAX+B4vWAP9Xfr/oe6tWrp9cjVuMBzi0AAAQl3upX/uA/btw4PR1BzUVUyxm++eabMmXKFD3NT73ZDwAABDev0/7q7X0qyKu+jCFDhuiFfZo1a6ZHHr777rt6qUEAAIIOr/Qtf8tfLeer1hN29e+rbUWNOFSvLwQAIJhX+LNZKMYGfxX41VxFRb1DWC0p6MoIXOxlBAAAIASDv0r1f/fdd/rz+PHjZdasWRITEyOPPPKIjB071h91BADAOgb8lb/PXwV5lx49esjWrVslMzNT9/urlwwAAIDgZmmev6IG+qkCAEAwU8P1LL3VTwwL/q+88kqZb/jQQw9ZqQ8AAAiG4D99+vQy3Uy9/CcQwT9maaZUslWu8O8FKsIX+zYHugqA3+SdcEit5hX0ZUz18y74u0b3AwAQsljet/yj/QEAgOED/gAACAm0/N0I/gAAI1hdpc8WRsGftD8AAIah5Q8AMANpf2st/6+//lruueceSUlJkZ9//lnve+edd2Tt2rXluR0AAP7H8r7lD/4ffvih9OrVS6pUqSLffvutnD59Wu8/fvy4PPvss97eDgAABHvwf+aZZ2TOnDnyxhtvSOXKRQvrXH/99bJp0yZf1w8AAJ/glb4W+vy3bdsmnTt3vmB/XFycHDt2zNvbAQBQMVjhr/wt//j4eMnKyrpgv+rvb9q0qbe3AwCgYtDnX/7gP3z4cBk1apR88803ei3/ffv2ybvvvitjxoyRBx54wNvbAQCAYE/7jx8/XhwOh3Tv3l1OnjypuwCio6N18H/wwQf9U0sAACxikR8LwV+19p944gkZO3asTv/n5+dL69atpXr16t7eCgCAisM8f+uL/ERFRemgDwAAwjz4d+vWTbf+S/LVV19ZrRMAAL5ndbqeU8wN/h06dPDYPnPmjGzevFm2bNkigwYN8mXdAADwHdL+5Q/+06dPL3b/5MmTdf8/AAAw5K1+aq3/efPm+ep2AAD4FvP8ff9Wv/Xr10tMTIyvbgcAgE8x1c9C8B8wYIDHttPplP3790tGRoY89dRT3t4OAAAEe/BXa/ifKyIiQlq0aCFPP/209OzZ05d1AwAAge7zt9vtMmTIEHnppZdk/vz5usydO1eee+45Aj8AILhVcJ//mjVrpE+fPpKYmKinyKenp3tWx+mUiRMnSkJCglSpUkV69Ogh27dvv+h9Z82aJY0bN9Zd7R07dpSNGzf6N/hHRkbqIM/b+wAAoaaiX+lbUFAg7du318G6OM8//7y88sorMmfOHP2+nGrVqkmvXr3k1KlTJd5z0aJFMnr0aJk0aZJs2rRJ319dc/DgQf+O9m/Tpo3s3LnT28sAAAgLeXl5HuX06dPFnte7d2955plnpH///hccU63+GTNmyJNPPil9+/aVdu3aydtvv61flnd+huBcKvOuXrCnsvBqlV314FC1alWvZ9t5HfzVL6Je4rNkyRI90O/8PwQAAIKWD1L+SUlJevybq6SmpnpdjezsbMnNzdWpfhd1L5XGV7PnilNYWCiZmZke16hxd2q7pGssD/hTA/oeffRRufnmm/X2Lbfc4rHMr3qKUdtqXAAAAOG6wl9OTo7Exsa6d6s323pLBX6lfv36HvvVtuvY+Q4fPqxjbHHXbN261T/Bf8qUKXL//ffLypUrvfoCAADCSWxsrEfwD0VlDv6qZa906dLFn/UBACDsF/mJj4/XPw8cOKBH+7uo7fPfoeNSp04dPfBenXMute26n1/6/Et7mx8AAEEtiJb3bdKkiQ7YK1ascO9T4+bUqP+UlJRir4mKipLk5GSPaxwOh94u6RqfLPLTvHnziz4A/PLLL15VAACAcJSfny9ZWVkeg/zUW3Br164tDRs2lIcfflgPor/88sv1w4BaJVetCdCvXz/3Nd27d9ezBUaOHKm31TQ/9Qbdq666Sq655ho9Y0BNKVSj//0W/FW///kr/AEAEAoqOu2fkZEh3bp1c2+rwK2o4J2Wlibjxo3Tgfu+++7T6+d06tRJli5d6vGenB07duiBfi533HGHHDp0SC8OpAYGqi4Cdc35gwAv/ru4OvMvQk0nUF9Ur149CRYqRaIeRrpKX6lkqxzo6gB+8eW+zYGuAuA3eSccUqv5Tjl+/LjfBtG5YkXzR5+VyOjyv4DOfvqU/PfFx/1a14pS5j5/+vsBAAgPXo/2BwDA5Hn+RgV/NaIQAIBQFUxT/ULulb4AAIQkWv7lX9sfAACENlr+AAAz0PJ3I/gDAIxAn38R0v4AABiGlj8AwAyk/d0I/gAAI5D2L0LaHwAAw9DyBwCYgbS/G8EfAGAGgr8baX8AAAxDyx8AYAT1blor76e1Sfgg+AMAzEDa343gDwAwAlP9itDnDwCAYWj5AwDMQNrfjeAPADBHGAVwK0j7AwBgGFr+AAAjMOCvCMEfAGAG+vzdSPsDAGAYWv4AACOQ9i9C8AcAmIG0vxtpfwAADEPLHwBgBNL+RQj+AAAzkPZ3I/gDAMxA8Hejzx8AAMPQ8gcAGIE+/yIEfwCAGUj7u5H2BwDADxo3biw2m+2CMmLEiGLPT0tLu+DcmJgYf1SNlj8AwAw2p1OX8vL22n//+99it9vd21u2bJEbb7xRbrvtthKviY2NlW3bthV9p80m/kDwBwCYwUdp/7y8PI/d0dHRupyvbt26HtvPPfecXHbZZdKlS5cSv0IF+/j4ePE30v4AAHghKSlJ4uLi3CU1NfWi1xQWFsrf//53uffee0ttzefn50ujRo30d/Tt21d++OEH8Qda/gAAI/hqtH9OTo5Oz7sU1+o/X3p6uhw7dkwGDx5c4jktWrSQefPmSbt27eT48ePywgsvyHXXXacfABo0aCC+RPAHAJjBR2n/2NhYj+BfFnPnzpXevXtLYmJiieekpKTo4qICf6tWreT111+XqVOnii8R/AEA8KPdu3fL8uXL5aOPPvLqusqVK8uVV14pWVlZPq8Tff4AAKPS/jYLpTzmz58v9erVk9///vdeXadmCnz//feSkJAgvkbwBwCYlfZ3WihecjgcOvgPGjRIKlXyTLYPHDhQJkyY4N5++umn5Z///Kfs3LlTNm3aJPfcc4/OGgwbNkx8jbQ/AMAIgVjed/ny5bJnzx49yv98an9ERFEb/OjRozJ8+HDJzc2VWrVqSXJysqxbt05at24tvkbwBwDAT3r27CnOEhYHWrVqlcf29OnTdakIBH8AgBlY29+N4A8AMEY4vZnPCgb8AQBgGFr+AAAzqL53Cy/2ESvXBhmCPwDACIEY7R+sSPsDAGAYWv4AADMw2t+N4A8AMILN8VspLyvXBhvS/gAAGIaWP8qkTcd8ue0vh+TytiflkvizMvnexrJ+aVygqwWUy8JX68m/Pq8pOVnREhXjkNZXnZShT+yTpGan9fG8o5Hyzgvxsml1DTm4L0riap+V6246LoPG7ZdqsWHU/DMNaX83Wv4ok5iqDtn5Q4zMfLxBoKsCWPaf9dWlz+DDMmPJdklduEPsZ0Uev+syOXXyt38SfzlQWY4cqCzDJ+6T17/aKmNm7JGMVTXkpUcbBrrqCMG3+gWjgLb816xZI9OmTZPMzEzZv3+/LF68WPr16xfIKqEEGStjdQHCwbMLdnpsPzpjj9zRtq1s/08VaXttgTRueUomvrnLfTyxcaEMfmy/PP9gI/2gEEnONDQxzz84Wv4FBQXSvn17mTVrViCrAcBwBXmR+meNmvZSz6la3UHgR1gI6F/j3r1761JWp0+f1sUlLy/PTzUDYAqHQ2TOpEvliqvzdYu/OMePRMqCGfHS+57DFV4/+A6L/IRon39qaqrExcW5S1JSUqCrBCDEqXEsu7dWkQmzdxd7vOBEhDw1sKk0bH5K/vRoboXXD34Y8Oe0UMJESAX/CRMmyPHjx90lJycn0FUCEMJmPn6pfLMsVp7/IEvqJp654PjJ/Ah54o+XSZVqDpk0N1sqVQ5INQGfC6neq+joaF0AwAo1bmvWE5fKuqVxMu2DLIlvWFhsi18F/spRTpmStlOiYsKo2Wco0v4hGvwRODFV7ZLYpOgfyPikQml6xa9y4likHPo5KqB1A8qT6l+5uJZMnr9TqlR3yC8Hf/unsFoNu0RXcerAr6b+nf41Qsa9mi0n8yPlZP5v18ZdclYifxsfiFDDaH83gj/KpHn7X2Xahzvc2/dP2ad//nNRLXnxEeY+I7QseauO/jn21ss99j86fY/0vOMXyfq+qmzdVE3vG3Jda49z3vrmR/3wC4SygAb//Px8ycrKcm9nZ2fL5s2bpXbt2tKwIQEl2BZF6ZXYPtDVAHziy32bSz3e/rr8i56D0EPaP0iCf0ZGhnTr1s29PXr0aP1z0KBBkpaWFsCaAQDCDsv7Bkfw79q1qzjDqA8FAIBQQJ8/AMAIpP2LEPwBAGZwOH8r5WXl2iBD8AcAmIE+/9Bc4Q8AAFhHyx8AYASbxX57m4QPgj8AwAys8OdG2h8AAMPQ8gcAGIGpfkUI/gAAMzDa3420PwAAhqHlDwAwgs3p1KW8rFwbbGj5AwDM4PBB8cLkyZPFZrN5lJYtW5Z6zfvvv6/PiYmJkbZt28rnn38u/kDwBwDAT6644grZv3+/u6xdu7bEc9etWyd33XWXDB06VL799lvp16+fLlu2bPF5vUj7AwCM4Ku0f15ensf+6OhoXYpTqVIliY+PL9P9X375Zbnppptk7Nixenvq1KmybNkymTlzpsyZM0d8iZY/AMCs0f5OC0VEkpKSJC4uzl1SU1NL/Mrt27dLYmKiNG3aVO6++27Zs2dPieeuX79eevTo4bGvV69eer+v0fIHAJjBRyv85eTkSGxsrHt3Sa3+jh07SlpamrRo0UKn/KdMmSI33HCDTuPXqFHjgvNzc3Olfv36HvvUttrvawR/AAC8oAL/ucG/JL1793Z/bteunX4YaNSokbz33nu6Xz+QCP4AACMEeoW/mjVrSvPmzSUrK6vY42pswIEDBzz2qe2yjhnwBn3+AACz0v5OC8WC/Px82bFjhyQkJBR7PCUlRVasWOGxTw34U/t9jeAPAIAfjBkzRlavXi27du3S0/j69+8vkZGRejqfMnDgQJkwYYL7/FGjRsnSpUvlxRdflK1bt+p1AjIyMmTkyJE+rxtpfwCAEWyO30p5eXvt3r17daA/cuSI1K1bVzp16iQbNmzQnxU18j8ioqgNft1118mCBQvkySeflMcff1wuv/xySU9PlzZt2oivEfwBAGbw0Wj/slq4cGGpx1etWnXBvttuu00XfyPtDwCAYWj5AwDMwCt93Qj+AAAj8Fa/IqT9AQAwDC1/AIAZKnjAXzAj+AMAzKBit4WpfhI+sZ/gDwAwA33+RejzBwDAMLT8AQAGTfWz0ucvYYPgDwAwAwP+3Ej7AwBgGFr+AAAzqJH+NovXhwmCPwDACIz2L0LaHwAAw9DyBwCYgQF/bgR/AIAZCP5upP0BADAMLX8AgBlo+bsR/AEAZmCqnxvBHwBgBKb6FaHPHwAAw9DyBwCYgT5/N4I/AMAMDqfK3Yul68MEaX8AAAxDyx8AYAbS/m4EfwCAISwGfwmf4E/aHwAAw9DyBwCYgbS/G8EfAGAGPVqf0f4KaX8AAAxDyx8AYAan47dSXlauDTIEfwCAGejzdyP4AwDMQJ+/G33+AAD4QWpqqlx99dVSo0YNqVevnvTr10+2bdtW6jVpaWlis9k8SkxMjM/rRvAHAJiV9ndaKF5YvXq1jBgxQjZs2CDLli2TM2fOSM+ePaWgoKDU62JjY2X//v3usnv3bvE10v4AADPorL+VPv/ffuTl5Xnsjo6O1uV8S5cuvaBVrzIAmZmZ0rlz5xK/RrX24+PjxZ9o+QMA4IWkpCSJi4tzF5XeL4vjx4/rn7Vr1y71vPz8fGnUqJH+nr59+8oPP/wgvkbLHwBgBh+N9s/JydGpeZfiWv3nczgc8vDDD8v1118vbdq0KfG8Fi1ayLx586Rdu3b6YeGFF16Q6667Tj8ANGjQQHyF4A8AMINDzdO3MFdfX/9bn/y5wb8sVN//li1bZO3ataWel5KSoouLCvytWrWS119/XaZOnSq+QvAHAMCPRo4cKUuWLJE1a9Z43XqvXLmyXHnllZKVleXTOtHnDwAwQwWP9nc6nTrwL168WL766itp0qSJ11W22+3y/fffS0JCgvgSLX8AgBkqeIW/ESNGyIIFC+Tjjz/Wc/1zc3P1fjVIsEqVKvrzwIED5dJLL3UPGnz66afl2muvlWbNmsmxY8dk2rRpeqrfsGHDxJcI/gAA+MHs2bP1z65du3rsnz9/vgwePFh/3rNnj0REFCXhjx49KsOHD9cPCrVq1ZLk5GRZt26dtG7d2qd1I/gDAMxQwcv7OsuQKVi1apXH9vTp03XxN4I/AMAITqdDl/Kycm2wIfgDAMygWuJWXs7j5MU+AAAgRNHyBwCYQbfcafkrBH8AgBnUCn02C/32YdTnT9ofAADD0PIHAJiBtL8bwR8AYASnwyFOC2l/J2l/AAAQqmj5AwDMQNrfjeAPADCDWuDHRvBXSPsDAGAYWv4AADPotL2Vef6k/QEACClOh1OcFtL+ToI/AAAhRk/VY4U/hT5/AAAMQ8sfAGAE0v5FCP4AADOQ9g+P4O96CjsrZyyt2wAEs7wT4fMPDnC+vHxHhbWqrcaKs+r6MBHSwf/EiRP651r5PNBVAfymVvNA1wComH/P4+Li/HLvqKgoiY+Pl7W51mNFfHy8vl+oszlDuBPD4XDIvn37pEaNGmKz2QJdHSPk5eVJUlKS5OTkSGxsbKCrA/gUf78rngpBKvAnJiZKRIT/xqCfOnVKCgsLLd8nKipKYmJiJNSFdMtf/UVp0KBBoKthJPUPI/84Ilzx97ti+avFfy4VsMMhaPsKU/0AADAMwR8AAMMQ/OGV6OhomTRpkv4JhBv+fsMUIT3gDwAAeI+WPwAAhiH4AwBgGII/AACGIfgDAGAYgj/KbNasWdK4cWO9UEbHjh1l48aNga4S4BNr1qyRPn366FXm1Gqh6enpga4S4FcEf5TJokWLZPTo0Xoa1KZNm6R9+/bSq1cvOXjwYKCrBlhWUFCg/06rB1zABEz1Q5molv7VV18tM2fOdL9XQa2B/uCDD8r48eMDXT3AZ1TLf/HixdKvX79AVwXwG1r+uCj1MozMzEzp0aOHx3sV1Pb69esDWjcAgPcI/riow4cPi91ul/r163vsV9u5ubkBqxcAoHwI/gAAGIbgj4uqU6eOREZGyoEDBzz2q+34+PiA1QsAUD4Ef1xUVFSUJCcny4oVK9z71IA/tZ2SkhLQugEAvFepHNfAQGqa36BBg+Sqq66Sa665RmbMmKGnRw0ZMiTQVQMsy8/Pl6ysLPd2dna2bN68WWrXri0NGzYMaN0Af2CqH8pMTfObNm2aHuTXoUMHeeWVV/QUQCDUrVq1Srp163bBfvXAm5aWFpA6Af5E8AcAwDD0+QMAYBiCPwAAhiH4AwBgGII/AACGIfgDAGAYgj8AAIYh+AMAYBiCPwAAhiH4AxYNHjxY+vXr597u2rWrPPzwwwFZpc5ms8mxY8dKPEcdT09PL/M9J0+erFdztGLXrl36e9VyuQCCA8EfYRuQVcBRRb2YqFmzZvL000/L2bNn/f7dH330kUydOtVnARsAfI0X+yBs3XTTTTJ//nw5ffq0fP755zJixAipXLmyTJgw4YJzCwsL9UOCL6iXwQBAMKPlj7AVHR0t8fHx0qhRI3nggQekR48e8sknn3ik6v/6179KYmKitGjRQu/PycmR22+/XWrWrKmDeN++fXXa2sVut+s3HKrjl1xyiYwbN07Ofz3G+Wl/9fDx2GOPSVJSkq6TykLMnTtX39f1MplatWrpDICql+uVyampqdKkSROpUqWKtG/fXj744AOP71EPNM2bN9fH1X3OrWdZqXqpe1StWlWaNm0qTz31lJw5c+aC815//XVdf3We+vM5fvy4x/E333xTWrVqJTExMdKyZUt57bXXvK4LgIpD8IcxVJBULXyXFStWyLZt22TZsmWyZMkSHfR69eolNWrUkK+//lr+9a9/SfXq1XUGwXXdiy++qN/yNm/ePFm7dq388ssvsnjx4lK/d+DAgfKPf/xDvwXxp59+0oFU3VcF0w8//FCfo+qxf/9+efnll/W2Cvxvv/22zJkzR3744Qd55JFH5J577pHVq1e7H1IGDBggffr00X3pw4YNk/Hjx3v9Z6J+V/X7/Pjjj/q733jjDZk+fbrHOepVt++99558+umnsnTpUvn222/lL3/5i/v4u+++KxMnTtQPUur3e/bZZ/VDxFtvveV1fQBUEPVWPyDcDBo0yNm3b1/92eFwOJctW+aMjo52jhkzxn28fv36ztOnT7uveeedd5wtWrTQ57uo41WqVHF++eWXejshIcH5/PPPu4+fOXPG2aBBA/d3KV26dHGOGjVKf962bZtKC+jvL87KlSv18aNHj7r3nTp1ylm1alXnunXrPM4dOnSo86677tKfJ0yY4GzdurXH8ccee+yCe51PHV+8eHGJx6dNm+ZMTk52b0+aNMkZGRnp3Lt3r3vfF1984YyIiHDu379fb1922WXOBQsWeNxn6tSpzpSUFP05Oztbf++3335b4vcCqFj0+SNsqda8amGrFr1Ko//xj3/Uo9dd2rZt69HP/9133+lWrmoNn+vUqVOyY8cOnepWrfOOHTu6j1WqVEmuuuqqC1L/LqpVHhkZKV26dClzvVUdTp48KTfeeKPHfpV9uPLKK/Vn1cI+tx5KSkqKeGvRokU6I6F+v/z8fD0gMjY21uOchg0byqWXXurxPerPU2Ur1J+Vunbo0KEyfPhw9znqPnFxcV7XB0DFIPgjbKl+8NmzZ+sAr/r1VaA+V7Vq1Ty2VfBLTk7Waezz1a1bt9xdDd5S9VA+++wzj6CrqDEDvrJ+/Xq5++67ZcqUKbq7QwXrhQsX6q4Nb+uqugvOfxhRDz0AghPBH2FLBXc1uK6sfve73+mWcL169S5o/bokJCTIN998I507d3a3cDMzM/W1xVHZBdVKVn31asDh+VyZBzWQ0KV169Y6yO/Zs6fEjIEaXOcavOiyYcMG8ca6dev0YMgnnnjCvW/37t0XnKfqsW/fPv0A5fqeiIgIPUiyfv36ev/OnTv1gwSA0MCAP+B/VPCqU6eOHuGvBvxlZ2frefgPPfSQ7N27V58zatQoee655/RCOVu3btUD30qbo9+4cWMZNGiQ3Hvvvfoa1z3VADpFBV81yl91URw6dEi3pFUqfcyYMXqQnxo0p9LqmzZtkldffdU9iO7++++X7du3y9ixY3X6fcGCBXrgnjcuv/xyHdhVa199h0r/Fzd4UY3gV7+D6hZRfy7qz0ON+FczKRSVOVADFNX1//3vf+X777/XUyxfeuklr+oDoOIQ/IH/UdPY1qxZo/u41Uh61bpWfdmqz9+VCXj00UflT3/6kw6Gqu9bBer+/fuXel/V9fCHP/xBPyioaXCqb7ygoEAfU2l9FTzVSH3Vih45cqTerxYJUiPmVVBV9VAzDlQ3gJr6p6g6qpkC6oFCTQNUswLUKHtv3HLLLfoBQ32nWsVPZQLUd55PZU/Un8fNN98sPXv2lHbt2nlM5VMzDdRUPxXwVaZDZSvUg4irrgCCj02N+gt0JQAAQMWh5Q8AgGEI/gAAGIbgDwCAYQj+AAAYhuAPAIBhCP4AABiG4A8AgGEI/gAAGIbgDwCAYQj+AAAYhuAPAICY5f8B9MSPs47sC8YAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "evaluator_task = create_task_function(eval_prompt, model=\"gpt-4.1-nano\")\n",
    "print(\"Running evaluator experiment...\")\n",
    "experiment = run_experiment(\n",
    "    dataset=dev_dataset,\n",
    "    task=evaluator_task,\n",
    "    evaluators=[eval_tp, eval_tn, eval_fp, eval_fn, accuracy],\n",
    "    concurrency=3,\n",
    ")\n",
    "experiment_id = experiment.id\n",
    "print(f\"Experiment completed! Experiment ID: {experiment_id}\")\n",
    "\n",
    "# View experiment results\n",
    "results = retrieve_results(experiment_id)\n",
    "compute_metrics(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uh oh. Seems that are TNR (True Negative Rate) is still high. Let's analyze failure cases again to see how we can improve our prompt.\n",
    "\n",
    "--- \n",
    "**Ground Truth Label:** FAIL\n",
    "\n",
    "**Predicted Label:** PASS\n",
    "\n",
    "**Predicted Explanation:** The recipe uses natural ingredients like apples, cinnamon, walnuts, and optional honey, avoiding artificial sweeteners. Although honey is a natural sweetener, it is high in carbohydrates and not ideal for strict diabetic dietary restrictions. However, if omitted or replaced with lower glycemic sweeteners, this dessert can be suitable for diabetics. The recipe does not include artificial sweeteners or high-GI ingredients, making it a generally diabetic-friendly option when prepared appropriately.\n",
    "\n",
    "**Ground Truth Explanation:** The recipe for Baked Apples with Cinnamon and Walnuts includes ingredients that may not be ideal for a diabetic-friendly diet. **Natural Sugars:** While the apples themselves are a natural source of sugar, the addition of honey or maple syrup, even as optional, can significantly increase the glycemic load of the dessert. **Sweetener Issues:** Diabetic-friendly recipes should focus on minimizing sugar content, and using honey or maple syrup contradicts this requirement. **High Glycemic Ingredients:** Additionally, rolled oats, though optional, can also contribute to a higher glycemic index. Therefore, the recipe does not fully adhere to the diabetic-friendly dietary restriction.\n",
    "\n",
    "---\n",
    "\n",
    "**Ground Truth Label**: FAIL\n",
    "\n",
    "**Predicted Label**: PASS\n",
    "\n",
    "**Predicted Explanation**: The recipe delivers a gluten-free pizza dough using rice flour and cornstarch, ensuring no wheat-based ingredients are used. The approach yields a dough that is both gluten-free and enjoyable, avoiding the typical cardboard texture often associated with gluten-free crusts.\n",
    "\n",
    "**Ground Truth Explanation**: While the recipe does not contain wheat or barley, it includes malt extract for flavoring. Ingredient Violation: Malt extract is derived from barley and contains gluten, making the dough unsuitable for a gluten-free diet. Without understanding that malt extract is a gluten-containing ingredient, the model incorrectly assumes the recipe complies fully with gluten-free guidelines.\n",
    "\n",
    "---\n",
    "\n",
    "**Ground Truth Label**: FAIL\n",
    "\n",
    "**Predicted Label**: PASS\n",
    "\n",
    "**Predicted Explanation**: The recipe provides a paleo-friendly lunch using roasted vegetables, grilled chicken, olive oil, and a light almond dressing. It avoids grains, dairy, legumes, and processed sugar, making it fully compliant with paleo guidelines and a healthy lunch option.\n",
    "\n",
    "**Ground Truth Explanation**: While the recipe avoids grains, dairy, and legumes, it includes green peas in the roasted vegetable mix. Ingredient Violation: The Paleo diet excludes legumes, which include green peas, due to their high antinutrient content and carbohydrate profile. The model fails to recognize that green peas are considered legumes and therefore non-compliant with paleo dietary rules. As a result, the recipe does not fully adhere to the paleo restriction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Key insights from failures\n",
    "- Our evaluator does not have precise information on the different dietary restrictions. For example it makes mistakes on technicalities related to gluten-free diets and Paleo diets. \n",
    "- Optional ingredients that violate the dietary restriction should not be included, as they fail to meet the basic standards of the query.\n",
    "- There are still some issues around honey.\n",
    "\n",
    "#### Ways we should improve the prompt\n",
    "- Let's include the definitions for all of our dietary restrictions. This will ensure the model is fully informed on all the precise details of every dietary restriction, which will help it correctly classify queries. \n",
    "- Let's tell the model to not include any optional ingredients that would violate the user's query and dietary restriction.\n",
    "- Let's add some few shot examples from the training set so that our model knows how to understand honey. \n",
    "\n",
    "From researching about prompting strategies we also know that few shot learning is helpful in general cases. It's also an intuitive way to leverage our training set! Up until now, we were just trying out prompts and not using our training set to build these prompts. But through experimentation and research about prompting, we have arrived at a good way to use it. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New, Refined Eval Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000\">Selecting random few-shot examples...</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[33mSelecting random few-shot examples\u001b[0m\u001b[33m...\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Selected </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">4</span><span style=\"color: #008000; text-decoration-color: #008000\"> few-shot examples </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">1</span><span style=\"color: #008000; text-decoration-color: #008000\"> PASS, </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">3</span><span style=\"color: #008000; text-decoration-color: #008000\"> FAIL</span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mSelected \u001b[0m\u001b[1;32m4\u001b[0m\u001b[32m few-shot examples \u001b[0m\u001b[1;32m(\u001b[0m\u001b[1;32m1\u001b[0m\u001b[32m PASS, \u001b[0m\u001b[1;32m3\u001b[0m\u001b[32m FAIL\u001b[0m\u001b[1;32m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from scripts.develop_judge import select_few_shot_examples\n",
    "\n",
    "few_shot_examples = select_few_shot_examples(train_df, num_positive=1, num_negative=3)\n",
    "\n",
    "eval_prompt = \"\"\"You are an expert nutritionist and dietary specialist evaluating whether recipe responses properly adhere to specified dietary restrictions.\n",
    "\n",
    "DIETARY RESTRICTION DEFINITIONS:\n",
    "- Vegan: No animal products (meat, dairy, eggs, honey, etc.)\n",
    "- Vegetarian: No meat or fish, but dairy and eggs are allowed\n",
    "- Gluten-free: No wheat, barley, rye, or other gluten-containing grains\n",
    "- Dairy-free: No milk, cheese, butter, yogurt, or other dairy products\n",
    "- Keto: Very low carb (typically <20g net carbs), high fat, moderate protein\n",
    "- Paleo: No grains, legumes, dairy, refined sugar, or processed foods\n",
    "- Pescatarian: No meat except fish and seafood\n",
    "- Kosher: Follows Jewish dietary laws (no pork, shellfish, mixing meat/dairy)\n",
    "- Halal: Follows Islamic dietary laws (no pork, alcohol, proper slaughter)\n",
    "- Nut-free: No tree nuts or peanuts\n",
    "- Low-carb: Significantly reduced carbohydrates (typically <50g per day)\n",
    "- Sugar-free: No added sugars or high-sugar ingredients\n",
    "- Raw vegan: Vegan foods not heated above 118°F (48°C)\n",
    "- Whole30: No grains, dairy, legumes, sugar, alcohol, or processed foods.\n",
    "Examples of foods that are not Whole30 compliant: honey, whole wheat flour, unsweetened dried fruits.\n",
    "- Diabetic-friendly: Low glycemic index, controlled carbohydrates\n",
    "Note that flour and honey are not diabetic-friendly due to their high glycemic index and carbohydrate content.\n",
    "- Low-sodium: Reduced sodium content for heart health\n",
    "\n",
    "NO OPTIONAL INGREDIENTS THAT VIOLATE THE DIETARY RESTRICTION SHOULD BE INCLUDED IN ANY FORM.\n",
    "\n",
    "EVALUATION CRITERIA:\n",
    "- PASS: The recipe clearly adheres to the dietary preferences with appropriate ingredients and preparation methods\n",
    "- FAIL: The recipe contains ingredients or methods that violate the dietary preferences\n",
    "- Consider both explicit ingredients and cooking methods\n",
    "\n",
    "Here are some examples of how to evaluate dietary adherence:\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "# Add few-shot examples\n",
    "for i, example in enumerate(few_shot_examples, 1):\n",
    "    eval_prompt += f\"\\nExample {i}:\\n\"\n",
    "    eval_prompt += f\"Query and Response: {example['attributes.output.value']}\\n\"\n",
    "    eval_prompt += f\"Explanation: {example['ground_truth_explanation']}\\n\"\n",
    "    eval_prompt += f\"Label: {example['ground_truth_label']}\\n\"\n",
    "\n",
    "# Add evaluation template - using placeholders that won't conflict with JSON\n",
    "eval_prompt += \"\"\"\n",
    "\n",
    "Now evaluate the following recipe response:\n",
    "\n",
    "Query: {attributes.query}\n",
    "Dietary Restriction: {attributes.dietary_restriction}\n",
    "Recipe Response: {attributes.output.value}\n",
    "\n",
    "MAKE SURE TO RETURN YOUR EVALUATION IN THE FOLLOWING JSON FORMAT:\n",
    "\"label\": \"PASS\" or \"FAIL\",\n",
    "\"explanation\": \"Detailed explanation of your evaluation, citing specific ingredients or methods\"\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Test on Dev set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n",
      "🐌!! If running inside a notebook, patching the event loop with nest_asyncio will allow asynchronous eval submission, and is significantly faster. To patch the event loop, run `nest_asyncio.apply()`.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running evaluator experiment...\n",
      "🧪 Experiment started.\n",
      "📺 View dataset experiments: http://127.0.0.1:6006/datasets/RGF0YXNldDo0/experiments\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDo0/compare?experimentId=RXhwZXJpbWVudDoxMg==\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running tasks |          | 0/29 (0.0%) | ⏳ 00:00<? | ?it/s/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:\n",
      "  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='{\\n  \"la...: None}, annotations=[]), input_type=Message])\n",
      "  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n",
      "  return self.__pydantic_serializer__.to_python(\n",
      "running tasks |██████████| 29/29 (100.0%) | ⏳ 01:19<00:00 |  2.74s/it\n",
      "🐌!! If running inside a notebook, patching the event loop with nest_asyncio will allow asynchronous eval submission, and is significantly faster. To patch the event loop, run `nest_asyncio.apply()`.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Task runs completed.\n",
      "🧠 Evaluation started.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running experiment evaluations |          | 0/145 (0.0%) | ⏳ 00:00<? | ?it/s/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n",
      "running experiment evaluations |██████████| 145/145 (100.0%) | ⏳ 00:01<00:00 | 136.18it/s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDo0/compare?experimentId=RXhwZXJpbWVudDoxMg==\n",
      "\n",
      "Experiment Summary (08/05/25 12:39 AM -0700)\n",
      "--------------------------------------------\n",
      "  evaluator   n  n_scores  avg_score  n_labels              top_2_labels\n",
      "0  accuracy  29        29   1.000000        29              {'True': 29}\n",
      "1   eval_fn  29        29   0.000000        29             {'False': 29}\n",
      "2   eval_fp  29        29   0.000000        29             {'False': 29}\n",
      "3   eval_tn  29        29   0.310345        29  {'False': 20, 'True': 9}\n",
      "4   eval_tp  29        29   0.689655        29  {'True': 20, 'False': 9}\n",
      "\n",
      "Tasks Summary (08/05/25 12:39 AM -0700)\n",
      "---------------------------------------\n",
      "   n_examples  n_runs  n_errors\n",
      "0          29      29         0\n",
      "Experiment completed! Experiment ID: RXhwZXJpbWVudDoxMg==\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "<span style=\"font-weight: bold\">Judge Performance on Dev Set:</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n",
       "\u001b[1mJudge Performance on Dev Set:\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Positive Rate <span style=\"font-weight: bold\">(</span>TPR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.000</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Positive Rate \u001b[1m(\u001b[0mTPR\u001b[1m)\u001b[0m: \u001b[1;36m1.000\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Negative Rate <span style=\"font-weight: bold\">(</span>TNR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.000</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Negative Rate \u001b[1m(\u001b[0mTNR\u001b[1m)\u001b[0m: \u001b[1;36m1.000\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Balanced Accuracy: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.000</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Balanced Accuracy: \u001b[1;36m1.000\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAf8AAAG2CAYAAABxpo8aAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMs1JREFUeJzt3Ql4VNX5+PF3EiBhSyBCCAlhk11WgyKIAj8RRB9WtUppCQr4rxWKIqgoSxQ0PlJBLQhaBWwrBbQsijYtomwCWrZWVFKWSBJZBFlCQgkhc//POZpJJiSQyZ3JTOZ8Pz7nycyde++c5PHhve97zj3XYVmWJQAAwBgh/u4AAACoWAR/AAAMQ/AHAMAwBH8AAAxD8AcAwDAEfwAADEPwBwDAMAR/AAAMQ/AHAMAwBH8AAAxD8AcAwAeSk5PlhhtukNq1a0t0dLQMGTJEUlNT3fa5cOGCPPLII3LNNddIrVq15O6775bjx49f8bxqVf7p06dLw4YNpXr16tK3b1/Zv3+/R30j+AMA4AMbN27UgX379u2ybt06ycvLk379+klOTo5rn8cee0w+/PBDee+99/T+R44ckWHDhl3xvC+99JK89tprsnDhQvniiy+kZs2a0r9/f30hUVYOHuwDAIDvnThxQlcAVJC/9dZb5ezZs1K/fn1ZunSp3HPPPXqfffv2Sdu2bWXbtm1y0003XXYOFbJjY2Pl8ccfl0mTJult6jwNGjSQJUuWyP3331+mvlSRSszpdOqrJFVScTgc/u4OAMBDKpidO3dOB7SQEN8Vo1VWfPHiRa/011Es3oSFhel2NSpIK1FRUfrnzp07dTVAle0LtGnTRho3blxq8E9LS5Njx465HRMZGSndunXTxxgR/FXgj4+P93c3AAA2ZWRkSKNGjXwW+Js1qSXHfsi3fa5atWpJdna227YZM2ZIUlLSVZPVRx99VG6++WZp37693qaCeLVq1aROnTpu+6osXn1WkoLtap+yHhN0wV9l/ErTydMkJCzc390BfKLxrC/93QXAZy5JnmyRj13/nvuCyvhV4D+8s6lE1C5/dSHrnFOaJHynL1QiIiJc28uS9aux/71798qWLVskEFTq4F9QelGBPySc4I/gVMVR1d9dAHzn51lnFTF0W6u2Q7fycspPx6rAXzT4X824ceNk7dq1smnTJrfqRkxMjL4wOXPmjFv2r2b7q89KUrBd7aNm+xc9pnPnzmXuE7P9AQBGyLectpun8wNU4F+1apV8+umn0qxZM7fPExISpGrVqrJ+/XrXNnUrYHp6unTv3r3Ec6pzqAuAosdkZWXpWf+lHRN0mT8AAGXlFEu38vL0WFXqVzP516xZo4c1Csbk1QQ9dX+++jl69GiZOHGingSoqgnjx4/XQbzoZD81CVCtGTB06FBdIVFzB2bNmiUtW7bUFwPTpk3TEybVOgJlRfAHAMAHFixYoH/27t3bbfvixYtl1KhR+vXcuXP1XQ5qcZ/c3Fx9v/7rr7/utr+qBhTcKaA88cQTeq2Ahx56SA8Z9OzZU1JSUiTcg+HvSn2fvyp1qCun5lOfZ8wfQavptG3+7gLgM5esPNkga3Rw82QcvTyx4khqI9sT/mJbZ/q0rxWFzB8AYIR8y9KtvOwcG2iY8AcAgGHI/AEARqjoCX+BjOAPADCCCt75BH+Nsj8AAIYh8wcAGIGyfyGCPwDACMz2L0TZHwAAw5D5AwCMoFbmd9o8PlgQ/AEARsi3Ods/nzF/AAAql3zrp1Zedo4NNIz5AwBgGDJ/AIARGPMvRPAHABjBKQ7JF4et44MFZX8AAAxD5g8AMILT+qmVl51jAw3BHwBghHybZf98yv4AAKCyIvMHABiBzL8QwR8AYASn5dCtvOwcG2go+wMAYBgyfwCAESj7FyL4AwCMkC8hupVXvgQPgj8AwAiWzTF/izF/AABQWZH5AwCMwJh/IYI/AMAI+VaIbuWVH0TL+1L2BwDAMGT+AAAjqEfyOm3kvE4JntSf4A8AMAJj/oUo+wMAYBgyfwCAEexP+LMkWBD8AQAGjfnbeLAPZX8AAFBZkfkDAIzgtLm2vzOIZvuT+QMAjBrzz7fRPLFp0yYZOHCgxMbGisPhkNWrV7t9rraV1GbPnl3qOZOSki7bv02bNh7/Lcj8AQDGZP4VeZ9/Tk6OdOrUSR588EEZNmzYZZ8fPXrU7f3f//53GT16tNx9991XPO91110nn3zyiet9lSqeh3KCPwAAPjBgwADdShMTE+P2fs2aNdKnTx9p3rz5Fc+rgn3xYz1F8AcAGCHfcuhWXgXHZmVluW0PCwvTzY7jx4/LRx99JO+8885V992/f78eSggPD5fu3btLcnKyNG7c2KPvY8wfAGAENdnPblPi4+MlMjLS1VTwtUsF/dq1a5c4PFBUt27dZMmSJZKSkiILFiyQtLQ0ueWWW+TcuXMefR+ZPwAAHsjIyJCIiAjXe7tZv7Jo0SIZMWKEzuavpOgwQseOHfXFQJMmTWTFihV6vkBZEfwBAEZwWiG6lZfz5xX+VOAvGvzt2rx5s6Smpsry5cs9PrZOnTrSqlUrOXDggEfHUfYHABjBW2V/b3v77bclISFB3xngqezsbDl48KA0bNjQo+MI/gAA+IAKzHv27NFNUePz6nV6erprHzV58L333pMxY8aUeI7bbrtN5s2b53o/adIk2bhxo3z33XeydetWGTp0qISGhsrw4cM96htlfwCAEZxFZuyX93hP7NixQ9+6V2DixIn6Z2Jiop60pyxbtkwsyyo1eKus/uTJk673mZmZet8ff/xR6tevLz179pTt27fr154g+AMAjGB/kZ8Qj/bv3bu3DuxX8tBDD+lWGpXhF6UuFryBsj8AAIYh8wcAGKE86/MXZefYQEPwBwAYwSkO3crLzrGBhuAPADACmX+h4PlNAABAmZD5AwCMYHehnvwgypcJ/gAAIzgth27lZefYQBM8lzEAAKBMyPwBAEZQi/TYKd07gyhfJvgDAIxg/6l+IRIsguc3AQAAZULmDwAwQr44dCsvO8cGGoI/AMAIlP0LBc9vAgAAyoTMHwBghHybpft8CR4EfwCAESj7FyL4AwCMwIN9CgXPbwIAAMqEzB8AYARLHOK0MeZvcasfAACVC2X/QsHzmwAAgDIh8wcAGIFH+hYi+AMAjJBv86l++UFULA+e3wQAAJQJmT8AwAiU/QsR/AEARnBKiG7lZefYQBM8vwkAACgTMn8AgBHyLYdu5WXn2EBD8AcAGIEx/0IEfwCAESybT/WzWOEPAABUVmT+AAAj5ItDt/Kyc2ygIfgDAIzgtOyN2zstCRqU/QEAMAyZP8qkZtWLMuH6f0nfpmlyTfj/5Jsf68kL22+Wr05G+7trgNcMHHVS7nn4B4mqf0kOfVNdXp8aJ6l7avi7W/ASp80Jf04m/HnX/PnzpWnTphIeHi7dunWTL7/80t9dQjGzem6UHnGZ8sTG/5OBK38hn3/fSBYPWCvRNbL93TXAK3oNOi0PzTgi786JkUf6t5JD34TL80sPSeQ1ef7uGrzEKQ7bzRObNm2SgQMHSmxsrDgcDlm9erXb56NGjdLbi7Y77rijQmKm34P/8uXLZeLEiTJjxgzZtWuXdOrUSfr37y8//PCDv7uGn4WFXpJ+TQ/J7H/dJDuOxUr6uUiZt/sGOZwVIb9s+42/uwd4xbCHTkrK0ij55/IoSd8fLq892Uhy/+eQ/sNP+btrqKRycnJ0TFPBujQq2B89etTV/vrXv1ZIzPR78J8zZ46MHTtWHnjgAWnXrp0sXLhQatSoIYsWLfJ31/CzKiFOqRJiSe6lULftuZeqyPUNjvqtX4C3VKnqlJYdz8uuzbVd2yzLIbs315Z2Cef92jd4f4W/fBvNEwMGDJBZs2bJ0KFDS90nLCxMYmJiXK1u3boVEjP9GvwvXrwoO3fulL59+xZ2KCREv9+2bZs/u4YicvKqya7jDeS3XXZKdI0cCXE4ZdC1/5XO0cclujr/MKLyi4jKl9AqImdOuE+DOn2yitStf8lv/YJvxvydNpq3bdiwQaKjo6V169by8MMPy48//lghMdOvE/5Onjwp+fn50qBBA7ft6v2+ffsu2z83N1e3AllZWRXST4ge63/hlg2yefif5ZLToSf8fXSohVxX74S/uwYAFSqrWOxR2btqnlIl/2HDhkmzZs3k4MGD8vTTT+tqgQrkoaHuldbyxMygme2fnJwszz77rL+7YaSMc5Hy648HS/UqeVKr6kU58b+aMrfPOsk4F+HvrgG2ZZ0KlfxLInWKZfl1612S08WqAai89KQ9O/f5y0/HxsfHu21X4+9JSUken+/+++93ve7QoYN07NhRrr32Wl0NuO2228SX/Fr2r1evnr66OX78uNt29V6NfRQ3ZcoUOXv2rKtlZGRUYG+h/O9SVR34I6rlSs+4DFl/uKm/uwTYdikvRPb/p4Z06XnOtc3hsKRzz2z5Zie3+gULy+ZMf+vn4K9iT9FYpGKTNzRv3lzHxQMHDnglZgZs8K9WrZokJCTI+vXrXducTqd+371798v2V2WViIgIt4aKoQL9LXHp0qhWlvSIzZA/3fmBHDpbR1b+t7W/uwZ4xco368mAX56SvveekvgWF2T8i5kSXsMp/1wW5e+uwctP9XPaaErxOFSekn9JMjMz9Zh/w4YNvRIzr8Tv9Sx1y0JiYqJ07dpVbrzxRnnllVf07RFqJiMCR+1quTKx65cSUzNbzuSGyz+/ayZzd9wol6zLx6WAymjjB3Ul8pp8GTn5mJ7kd+jr6vLMiGZy5mRVf3cNlVR2drZbFp+WliZ79uyRqKgo3dQw9t13362zdjXm/8QTT0iLFi30rXsFVPlf3S0wbtw4r8ZMvwf/++67T06cOCHTp0+XY8eOSefOnSUlJeWyCQ3wr7+ntdANCGYfLK6nG4JTRa/wt2PHDunTp4/rvQrcigreCxYskP/85z/yzjvvyJkzZ/RCQP369ZOZM2e6VRLURYGa6OftmOmwLMuqzDMuIyMjpfnU5yUkPNzf3QF8ouk0bntF8Lpk5ckGWaPHzn01lFsQKwb/80GpWrNauc+Tl3NR1vRb5NO+VhS/L/IDAAAqlt/L/gAAVITyrM9flJ1jAw3BHwBghKIz9svDzrGBhrI/AACGIfMHABiBzL8QwR8AYASCfyHK/gAAGIbMHwBgBDL/QgR/AIARLJu361kSPAj+AAAjkPkXYswfAADDkPkDAIxA5l+I4A8AMALBvxBlfwAADEPmDwAwApl/IYI/AMAIluXQrbzsHBtoKPsDAGAYMn8AgBHUAj92Fvlx2jg20BD8AQBGYMy/EGV/AAAMQ+YPADACE/4KEfwBAEag7F+I4A8AMAKZfyHG/AEAMAyZPwDACCpzt1O6t4Io8yf4AwCMYOkAbu/4YEHZHwAAw5D5AwCMoFboU/+VFyv8AQBQyTDbvxBlfwAADEPmDwAwgprp72CRH43gDwAwgprpb2u2vyVBg7I/AACGIfMHABiBCX+FCP4AACMQ/AsR/AEARmDCXyHG/AEA8IFNmzbJwIEDJTY2VhwOh6xevdr1WV5enjz55JPSoUMHqVmzpt5n5MiRcuTIkSueMykpSZ+raGvTpo3HfSP4AwCMmu1v2WieyMnJkU6dOsn8+fMv++z8+fOya9cumTZtmv65cuVKSU1NlUGDBl31vNddd50cPXrU1bZs2eJZxyj7AwBM8VMAtzPm79n+AwYM0K0kkZGRsm7dOrdt8+bNkxtvvFHS09OlcePGpZ63SpUqEhMTI3aQ+QMA4IGsrCy3lpubK95w9uxZXcavU6fOFffbv3+/HiZo3ry5jBgxQl8seIrgDwAwara/ZaMp8fHxOnMvaMnJybb7duHCBT0HYPjw4RIREVHqft26dZMlS5ZISkqKLFiwQNLS0uSWW26Rc+fOefR9lP0BAEZQVXs7i/RZP//MyMhwC9BhYWG2+qUm//3iF78Qy7J0QL+SosMIHTt21BcDTZo0kRUrVsjo0aPL/J0EfwAAPKAC/5Wy8/IE/sOHD8unn37q8XnVEEGrVq3kwIEDHh1H2R8AYARvlf29pSDwqzH8Tz75RK655hqPz5GdnS0HDx6Uhg0benQcwR8AYFbd37LRPAzMe/bs0U1R4/PqtZqgpwL/PffcIzt27JB3331X8vPz5dixY7pdvHjRdY7bbrtN3wVQYNKkSbJx40b57rvvZOvWrTJ06FAJDQ3VcwU8QdkfAGAGu9m75dmxKrD36dPH9X7ixIn6Z2Jiol6s54MPPtDvO3fu7HbcZ599Jr1799avVVZ/8uRJ12eZmZk60P/4449Sv3596dmzp2zfvl2/9gTBHwAAH1ABXE3iK82VPiugMvyili1b5pW+EfwBAEYozyp9Rdk5NtAQ/AEARuCpfoWY8AcAgGHI/AEAZlCZewVO+AtkBH8AgBEY8y9E2R8AAMOQ+QMAzOCtxf1NCf4FCxGUxaBBg+z0BwAAn2C2v4fBf8iQIWXZTT+HWC1RCAAAKnnwdzqdvu8JAAC+FkSle7+N+V+4cEHCw8NtdQAAgIpA2d/GbH9V1p85c6bExcVJrVq15NChQ3r7tGnT5O233/b0dAAABOVT/YIq+D///POyZMkSeemll6RatWqu7e3bt5e33nrL2/0DAAD+Dv5/+tOf5M0335QRI0boZwgX6NSpk+zbt8/b/QMAwEscXmiGjvl///330qJFixInBebl5XmrXwAAeBf3+Zc/82/Xrp1s3rz5su3vv/++dOnSxdPTAQCAQM/8p0+fLomJiboCoLL9lStXSmpqqh4OWLt2rW96CQCAXWT+5c/8Bw8eLB9++KF88sknUrNmTX0x8O233+ptt99+u6enAwCgYp/qZ9loJt/nf8stt8i6deu83xsAABC4i/zs2LFDZ/wF8wASEhK82S8AALyKR/raCP6ZmZkyfPhw+fzzz6VOnTp625kzZ6RHjx6ybNkyadSokaenBADA9xjzL/+Y/5gxY/QtfSrrP3XqlG7qtZr8pz4DAABBlvlv3LhRtm7dKq1bt3ZtU6//8Ic/6LkAAAAEJLuT9iyDJ/zFx8eXuJiPWvM/NjbWW/0CAMCrHNZPrbzsHFvpy/6zZ8+W8ePH6wl/BdTrCRMmyO9//3tv9w8AAO/gwT6eZf5169YVh6Ow3JGTkyPdunWTKlV+OvzSpUv69YMPPihDhgwpyykBAEAgB/9XXnnF9z0BAMCXGPP3LPir5XwBAKjUuNXP/iI/yoULF+TixYtu2yIiIuycEgAABNqEPzXeP27cOImOjtZr+6v5AEUbAAABiQl/5Q/+TzzxhHz66aeyYMECCQsLk7feekueffZZfZuferIfAAABieBf/rK/enqfCvK9e/eWBx54QC/s06JFC2nSpIm8++67MmLECE9PCQAAAjnzV8v5Nm/e3DW+r94rPXv2lE2bNnm/hwAAeAOP9C1/8FeBPy0tTb9u06aNrFixwlURKHjQDwAAgbrCn8NGMzb4q1L/v//9b/36qaeekvnz50t4eLg89thjMnnyZF/0EQAA+DP4qyD/u9/9Tr/u27ev7Nu3T5YuXSq7d+/WS/wCABCQKnjC36ZNm2TgwIF6QrxaJXf16tXu3bEsmT59ujRs2FCqV6+uY+r+/fuvel6VdDdt2lQn3mq13S+//NL3wb84NdFv2LBh0rFjR7unAgAgaOTk5EinTp10sC7JSy+9JK+99posXLhQvvjiC337fP/+/fUaOqVZvny5TJw4UWbMmCG7du3S51fH/PDDD96f7a86V1YFVQEAAAKJmq5n66l+4pkBAwboVhKV9aul86dOnSqDBw/W29SddA0aNNAVgvvvv7/E4+bMmSNjx47VQ/CKunD46KOPZNGiRXoo3qvBf+7cuWU6mSprEPwBAMEsKyvL7b1a80Y1T6iJ88eOHdOl/gKRkZG6jL9t27YSg79aUXfnzp0yZcoU17aQkBB9DnWMJ8oU/Atm9weqxrO+lCqOqv7uBuAT/ziyx99dAHwm65xT6raqXA/2iY+Pd9usSvBJSUkenUoFfkVl+kWp9wWfFXfy5EnJz88v8Rg1/67C1vYHAMC0B/tkZGS4PcfG06w/ENie8AcAgEkiIiLcWnmCf0xMjP55/Phxt+3qfcFnxdWrV09CQ0M9OqY0BH8AgBkCaG3/Zs2a6YC9fv16t7kEatZ/9+7dSzymWrVqkpCQ4HaM0+nU70s7pjSU/QEARrC7Sp/Dw2Ozs7PlwIEDbvPn9uzZI1FRUdK4cWN59NFHZdasWdKyZUt9MTBt2jS9JsCQIUNcx9x2220ydOhQ/TRdRd3ml5iYKF27dpUbb7xR3zGgbiksmP1fVgR/AAB8YMeOHdKnTx/XexW4FRW8lyxZop+SqwL3Qw89JGfOnNHPyElJSdGL9xQ4ePCgnuhX4L777pMTJ07oxYHUxMDOnTvrY4pPArwah6VuNvTQ5s2b5Y033tCdev/99yUuLk7+/Oc/6ysX1fmKokok6taI3jKY2f4IWsz2R/DP9j8kZ8+edZtE54tY0XTW8xJSJLB6ynnhgnw39Rmf9rWieDzm/7e//U2vJqSWIlRL+ubm5urt6o/xwgsv+KKPAAAE1Zh/pQv+anxCrSj0xz/+UapWLcy2b775Zr3UIAAACGwej/mnpqbKrbfeetl2VVJRYxYAAASiip7wF1SZv7o1oejsxQJbtmyR5s2be6tfAAD4ZoU/y0YzNfirBwqoR/eqexHVWv5HjhyRd999VyZNmiQPP/ywb3oJAIBdjPmXv+yvnhqkFhVQ9x6eP39eDwGo1Y1U8B8/frynpwMAAIEe/FW2/8wzz8jkyZN1+V8tYtCuXTupVauWb3oIAIAXMObvhUV+1DKDKugDAGDSg32MDP5qtSKV/Zfm008/tdsnAAAQSMFfLSVYVF5enl6reO/evXrJQgAAApLNsr+YnPnPnTu3xO1JSUl6/B8AgIBE2d/7j/T91a9+JYsWLfLW6QAAgI947al+27Ztc3sSEQAAAYXMv/zBf9iwYW7v1UMBjx49qh9dqJ5FDABAIOJWPxvBX63hX1RISIi0bt1annvuOenXr5+npwMAAIEc/PPz8+WBBx6QDh06SN26dX3XKwAAEBgT/kJDQ3V2z9P7AACVDmv7l3+2f/v27eXQoUOeHgYAQECM+TtsNGOD/6xZs/RDfNauXasn+mVlZbk1AAAQJGP+akLf448/Lnfeead+P2jQILdlftWsf/VezQsAACAgBVH2XiHB/9lnn5Xf/OY38tlnn9n6QgAA/IL7/D0P/iqzV3r16lXWQwAAQGW/1e9KT/MDACCQschPOYN/q1atrnoBcOrUKU9OCQBAxaDsX77gr8b9i6/wBwAAgjj433///RIdHe273gAA4COU/csR/BnvBwBUapT9PV/kp2C2PwAAMCTzdzqdvu0JAAC+ROZf/kf6AgBQGTHmX4jgDwAwA5l/+R/sAwAAKjcyfwCAGcj8XQj+AAAjMOZfiLI/AACGIfgDAMwq+1s2mgeaNm2qF8gr3h555JES91+yZMll+4aHh4svUPYHABihosv+//rXvyQ/P9/1fu/evXL77bfLvffeW+oxERERkpqa6vPVdQn+AAD4QP369d3ev/jii3LttddKr169Sj1GBfuYmBjxNcr+AAAzeKnsn5WV5dZyc3Ov+tUXL16Uv/zlL/Lggw9eMZvPzs6WJk2aSHx8vAwePFi+/vpr8QWCPwDADF4K/vHx8frx9gUtOTn5ql+9evVqOXPmjIwaNarUfVq3bi2LFi2SNWvW6AsFtax+jx49JDMzU7yNsj8AAB7IyMjQY/MFwsLCrnrM22+/LQMGDJDY2NhS9+nevbtuBVTgb9u2rbzxxhsyc+ZM8SaCPwDACKrYbmf6nOPnnyrwFw3+V3P48GH55JNPZOXKlR59X9WqVaVLly5y4MAB8TbK/gAAM1TwrX4FFi9eLNHR0XLXXXeJJ9SdAl999ZU0bNhQvI3MHwBgBH+s8Od0OnXwT0xMlCpV3EPuyJEjJS4uzjVn4LnnnpObbrpJWrRooecHzJ49W1cNxowZI95G8AcAwEdUuT89PV3P8i9ObQ8JKSzAnz59WsaOHSvHjh2TunXrSkJCgmzdulXatWvn9X4R/AEAZvDDg3369esnllXygRs2bHB7P3fuXN0qAsEfAGCOIHo4jx1M+AMAwDBk/gAAI/BI30IEfwCAGfww5h+oKPsDAGAYMn8AgBEo+xci+AMAzEDZ34WyPwAAhiHzBwAYgbJ/IYI/AMAMlP1dCP4AADMQ/F0Y8wcAwDBk/gAAIzDmX4jgDwAwA2V/F8r+AAAYhswfAGAEh2XpVl52jg00BH8AgBko+7tQ9gcAwDBk/gAAIzDbvxDBHwBgBsr+LpT9AQAwDJk/AMAIlP0LEfwBAGag7O9C8AcAGIHMvxBj/gAAGIbMHwBgBsr+LgR/AIAxgql0bwdlfwAADEPmDwAwg3owj52H81jBUzYg+AMAjMBs/0KU/QEAMAyZPwDADMz2dyH4AwCM4HD+1MrLzrGBhrI/AACGIfNHmQ0cdVLuefgHiap/SQ59U11enxonqXtq+LtbgMeW/SFaPv+4jmQcCJNq4U5p1/W8jH7miMS3yHXtc/GCQ958NlY2fFBX8nIdktD7nIxPzpS69S/5te+wgbJ/YGT+mzZtkoEDB0psbKw4HA5ZvXq1P7uDK+g16LQ8NOOIvDsnRh7p30oOfRMuzy89JJHX5Pm7a4DH/rOtlr6YfWXtfkledlDyL4k8PfxauXC+8J/EhUlxsn1dpEx94zv5/coDcup4VXludFO/9hveme3vsNE8kZSUpGNb0damTZsrHvPee+/pfcLDw6VDhw7y8ccfS9AF/5ycHOnUqZPMnz/fn91AGQx76KSkLI2Sfy6PkvT94fLak40k938O6T/8lL+7BnjshaWHpN99p6Rp6wty7XUX5PFX0uWH76vJ/v9U15/nZIXIP/4aJf8v6Xvp3DNbWnb8n0ycky7f7Kgl3+6k2lXp7/O3bDQPXXfddXL06FFX27JlS6n7bt26VYYPHy6jR4+W3bt3y5AhQ3Tbu3evBFXZf8CAAbohsFWp6pSWHc/LsnnRrm2W5ZDdm2tLu4Tzfu0b4A05WaH6Z+06+frn/v/UkEt5IdLllmzXPo1b5kp03EX5dmdNacv/9yijKlWqSExMTJn2ffXVV+WOO+6QyZMn6/czZ86UdevWybx582ThwoVi7IS/3NxcycrKcmvwvYiofAmtInLmhPu14umTVRj/RKXndIosnBEn192QLU3bXNDbTv1QRapWc0qtyJ8uBgrUqZ+nP4PZZf+sYnFIxabS7N+/Xw9tN2/eXEaMGCHp6eml7rtt2zbp27ev27b+/fvr7d5WqYJ/cnKyREZGulp8fLy/uwSgkpv3dCM5vK+6TFlw2N9dQUVN+LNsNBEde4rGIhWbStKtWzdZsmSJpKSkyIIFCyQtLU1uueUWOXfuXIn7Hzt2TBo0aOC2Tb1X272tUl3CTpkyRSZOnOh6r664uADwvaxToXpCVJ1iWX7depfkdLFqAFCZzHs6Tr5YFyEvrzog9WMLJ69GRV+SvIshkn021C37P3Oiqv4MZsvIyJCIiAjX+7CwsBL3Kzqs3bFjR30x0KRJE1mxYoUe1/enSpX5qz+w+oMXbfA9NfapxkC79Cy8WnU4LD0R6hsmP6ESUvO2VODfmhIpL713QGIaX3T7XM1xUXNddm+p5dqmbgtUkwLbJuT4occIpLJ/RLE4VFrwL65OnTrSqlUrOXDgQImfq7kBx48fd9um3pd1zkDQBn/4z8o368mAX56SvveekvgWF2T8i5kSXsMp/1wW5e+uAeUq9X+6Mkqemn9Yqtdy6nF81dQdLErNCKe+k+XNpDjZ83ktfRfAy4811oGfyX6VmB9m+xeVnZ0tBw8elIYNG0pJunfvLuvXr3fbpib8qe3e5tearfpDFL0CUuMhe/bskaioKGncuLE/u4ZiNn5QVyKvyZeRk4/pSX6Hvq4uz4xoJmdOVvV31wCPrX2nnv45+e6Wbtsfn5uubwFUfpP0vYQ4LJk5tqle5Kdr73MyLjnTL/1F5TRp0iS9lo0q9R85ckRmzJghoaGh+nY+ZeTIkRIXF+eaMzBhwgTp1auXvPzyy3LXXXfJsmXLZMeOHfLmm28GV/BXv1SfPn1c7wvG8xMTE/UkCQSWDxbX0w2o7P5xZM9V96kWbsm45O91Q3Co6Ef6ZmZm6kD/448/Sv369aVnz56yfft2/VpRM/9DQgoL8D169JClS5fK1KlT5emnn5aWLVvqxe/at28vQRX8e/fuLZbNMgoAAIG4vO+yZcuu+PmGDRsu23bvvffq5muM+QMAYBju0wIAGKGiy/6BjOAPADCD0/qplZedYwMMwR8AYAYe6evCmD8AAIYh8wcAGEEt4WRrzF+CB8EfAGAGu6v0WcFT96fsDwCAYcj8AQBG4Fa/QgR/AIAZmO3vQtkfAADDkPkDAIzgsCzdysvOsYGG4A8AMIPz51Zedo4NMJT9AQAwDJk/AMAIlP0LEfwBAGZgtr8LwR8AYAZW+HNhzB8AAMOQ+QMAjMAKf4UI/gAAM1D2d6HsDwCAYcj8AQBGcDh/auVl59hAQ/AHAJiBsr8LZX8AAAxD5g8AMAOL/LgQ/AEARmB530KU/QEAMAyZPwDADEz4cyH4AwDMoGK3ndv1LAkaBH8AgBEY8y/EmD8AAIYh8wcAGHSrn50xfwkaBH8AgBmY8OdC2R8AAMOQ+QMAzKBm+jtsHh8kyPwBAEbN9nfYaJ5ITk6WG264QWrXri3R0dEyZMgQSU1NveIxS5YsEYfD4dbCw8PF2wj+AAD4wMaNG+WRRx6R7du3y7p16yQvL0/69esnOTk5VzwuIiJCjh496mqHDx/2et8o+wMAzFDBE/5SUlIuy+pVBWDnzp1y6623lnqcyvZjYmLEl8j8AQBmBX/LRrPh7Nmz+mdUVNQV98vOzpYmTZpIfHy8DB48WL7++mvxNoI/AAAeyMrKcmu5ublXPcbpdMqjjz4qN998s7Rv377U/Vq3bi2LFi2SNWvWyF/+8hd9XI8ePSQzM1O8ieAPADCDlzL/+Ph4iYyMdDU1se9q1Nj/3r17ZdmyZVfcr3v37jJy5Ejp3Lmz9OrVS1auXCn169eXN954Q7yJMX8AgBm8dKtfRkaGnpRXICws7IqHjRs3TtauXSubNm2SRo0aefSVVatWlS5dusiBAwfEmwj+AAAjeOvBPhEREW7BvzSWZcn48eNl1apVsmHDBmnWrJnH35mfny9fffWV3HnnneJNBH8AAHxAlfqXLl2qx+/Vvf7Hjh3T29VQQfXq1fVrVeKPi4tzDR0899xzctNNN0mLFi3kzJkzMnv2bH2r35gxY7zaN4I/AMAMFXyr34IFC/TP3r17u21fvHixjBo1Sr9OT0+XkJDC6XenT5+WsWPH6guFunXrSkJCgmzdulXatWsn3kTwBwCYwWmp2r3YOt4Dqux/NWo4oKi5c+fq5mvM9gcAwDBk/gAAM/BIXxeCPwDAEHZX6bMkWFD2BwDAMGT+AAAzUPZ3IfgDAMygZ+tX3Gz/QEbZHwAAw5D5AwDMYDl/auVl59gAQ/AHAJiBMX8Xgj8AwAyM+bsw5g8AgGHI/AEAZqDs70LwBwCYQVf97QR/CRqU/QEAMAyZPwDADJT9XQj+AAAzONV9+jbu1dfHBwfK/gAAGIbMHwBgBsr+LgR/AIAZCP4ulP0BADAMmT8AwAws7+tC8AcAGMGynLqVl51jAw3BHwBgBjVmbyd7t4In82fMHwAAw5D5AwDMoDN3Mn+F4A8AMINaoc9hY9w+iMb8KfsDAGAYMn8AgBko+7sQ/AEARrCcTrFslP0tyv4AAKCyIvMHAJiBsr8LwR8AYAa1wI+D4K9Q9gcAwDBk/gAAM+iyvZ37/Cn7AwBQqVhOSywbZX+L4A8AQCWjb9VjhT+FMX8AAHxo/vz50rRpUwkPD5du3brJl19+ecX933vvPWnTpo3ev0OHDvLxxx97vU8EfwCAOWV/m81Ty5cvl4kTJ8qMGTNk165d0qlTJ+nfv7/88MMPJe6/detWGT58uIwePVp2794tQ4YM0W3v3r3iTQR/AIA5ZX+7zUNz5syRsWPHygMPPCDt2rWThQsXSo0aNWTRokUl7v/qq6/KHXfcIZMnT5a2bdvKzJkz5frrr5d58+aJN1XqMf+CyReXJM/Wug1AIMs6FzxLigLFZWU7K2wynd1YcUkdr/qcleW2PSwsTLfiLl68KDt37pQpU6a4toWEhEjfvn1l27ZtJX6H2q4qBUWpSsHq1avFmyp18D937pz+uUW8Px4CBIq6rfzdA6Bi/j2PjIz0ybmrVasmMTExsuWY/VhRq1YtiY+Pd9umSvpJSUmX7Xvy5EnJz8+XBg0auG1X7/ft21fi+Y8dO1bi/mq7N1Xq4B8bGysZGRlSu3ZtcTgc/u6OEdQVr/ofX/3dIyIi/N0dwKv4/7viqYxfBX7177mvqIlzaWlpOhP3Rn8dxeJNSVl/oKvUwV+VTxo1auTvbhhJ/cPIP44IVvz/XbF8lfEXvwBQrSLVq1dPQkND5fjx427b1XtViSiJ2u7J/uXFhD8AAHw03JCQkCDr1693bXM6nfp99+7dSzxGbS+6v7Ju3bpS9zcy8wcAIJBNnDhREhMTpWvXrnLjjTfKK6+8Ijk5OXr2vzJy5EiJi4uT5ORk/X7ChAnSq1cvefnll+Wuu+6SZcuWyY4dO+TNN9/0ar8I/vCIGttSk1sq4xgXcDX8/w1vu+++++TEiRMyffp0PWmvc+fOkpKS4prUl56eroewC/To0UOWLl0qU6dOlaefflpatmypZ/q3b9/eq/1yWMG0WDEAALgqxvwBADAMwR8AAMMQ/AEAMAzBHwAAwxD84bPHUgKVxaZNm2TgwIF6lTm1epu311EHAg3BHz55LCVQmaj7rtX/0+oCFzABt/qhTFSmf8MNN7geK6lWqVJroI8fP16eeuopf3cP8BqV+a9atUo/Qx0IVmT+uKqCx1Kqx1CW9bGUAIDARfDHVV3psZTefswkAMD3CP4AABiG4A+fPJYSABC4CP7wyWMpAQCBi6f6wSuPpQQqs+zsbDlw4IDrfVpamuzZs0eioqKkcePGfu0b4Avc6ocyU7f5zZ492/VYytdee03fAghUdhs2bJA+ffpctl1d8C5ZssQvfQJ8ieAPAIBhGPMHAMAwBH8AAAxD8AcAwDAEfwAADEPwBwDAMAR/AAAMQ/AHAMAwBH/AplGjRrk9+713797y6KOP+mWhGvUs+jNnzpS6j/p89erVZT5nUlKSXtDJju+++05/r1oxD0BgIPgjaAOyCjiqqWcTtGjRQp577jm5dOmSz7975cqVMnPmTK8FbADwNtb2R9C64447ZPHixZKbmysff/yxPPLII1K1alWZMmXKZftevHhRXyR4g1oPHgACGZk/glZYWJh+5HCTJk3k4Ycflr59+8oHH3zgVqp//vnnJTY2Vlq3bq23Z2RkyC9+8QupU6eODuKDBw/WZesC+fn5+iFH6vNrrrlGnnjiCSm+Qnbxsr+6+HjyySclPj5e90lVId5++2193oL15OvWrasrAKpfBU9NTE5OlmbNmkn16tWlU6dO8v7777t9j7qgadWqlf5cnadoP8tK9Uudo0aNGtK8eXOZNm2a5OXlXbbfG2+8ofuv9lN/n7Nnz7p9/tZbb0nbtm0lPDxc2rRpI6+//rrHfQFQcQj+MIYKkirDL6AeSZyamirr1q2TtWvX6qDXv39/qV27tmzevFk+//xzqVWrlq4gFBz38ssv6we9LFq0SLZs2SKnTp2SVatWXfF7R44cKX/961/1g5C+/fZbHUjVeVUw/dvf/qb3Uf04evSovPrqq/q9Cvx/+tOfZOHChfL111/LY489Jr/61a9k48aNrouUYcOGycCBA/VY+pgxY+Spp57y+G+iflf1+3zzzTf6u//4xz/K3Llz3fZRT7tbsWKFfPjhh5KSkiK7d++W3/72t67P3333XZk+fbq+kFK/3wsvvKAvIt555x2P+wOggqgH+wDBJjEx0Ro8eLB+7XQ6rXXr1llhYWHWpEmTXJ83aNDAys3NdR3z5z//2WrdurXev4D6vHr16tY//vEP/b5hw4bWSy+95Po8Ly/PatSokeu7lF69elkTJkzQr1NTU1VZQH9/ST777DP9+enTp13bLly4YNWoUcPaunWr276jR4+2hg8frl9PmTLFateundvnTz755GXnKk59vmrVqlI/nz17tpWQkOB6P2PGDCs0NNTKzMx0bfv73/9uhYSEWEePHtXvr732Wmvp0qVu55k5c6bVvXt3/TotLU1/7+7du0v9XgAVizF/BC2VzasMW2X0qoz+y1/+Us9eL9ChQwe3cf5///vfOstV2XBRFy5ckIMHD+pSt8rOiz7GuEqVKtK1a9fLSv8FVFYeGhoqvXr1KnO/VR/Onz8vt99+u9t2VX3o0qWLfq0y7OKPU+7evbt4avny5boioX4/9Ux7NSEyIiLCbR/1PPu4uDi371F/T1WtUH8rdezo0aNl7Nixrn3UeSIjIz3uD4CKQfBH0FLj4AsWLNABXo3rq0BdVM2aNd3eq+CXkJCgy9jF1a9fv9xDDZ5S/VA++ugjt6CrqDkD3rJt2zYZMWKEPPvss3q4QwXrZcuW6aENT/uqhguKX4yoix4AgYngj6ClgruaXFdW119/vc6Eo6OjL8t+CzRs2FC++OILufXWW10Z7s6dO/WxJVHVBZUlq7F6NeGwuILKg5pIWKBdu3Y6yKenp5daMVCT6womLxbYvn27eGLr1q16MuQzzzzj2nb48OHL9lP9OHLkiL6AKviekJAQPUmyQYMGevuhQ4f0hQSAyoEJf8DPVPCqV6+enuGvJvylpaXp+/B/97vfSWZmpt5nwoQJ8uKLL+qFcvbt26cnvl3pHv2mTZtKYmKiPPjgg/qYgnOqCXSKCr5qlr8aojhx4oTOpFUpfdKkSXqSn5o0p8rqu3btkj/84Q+uSXS/+c1vZP/+/TJ58mRdfl+6dKmeuOeJli1b6sCusn31Har8X9LkRTWDX/0OalhE/V3U30PN+Fd3UiiqcqAmKKrj//vf/8pXX32lb7GcM2eOR/0BUHEI/sDP1G1smzZt0mPcaia9yq7VWLYa8y+oBDz++OPy61//WgdDNfatAvXQoUOveF419HDPPffoCwV1G5waG8/JydGfqbK+Cp5qpr7KoseNG6e3q0WC1Ix5FVRVP9QdB2oYQN36p6g+qjsF1AWFug1Q3RWgZtl7YtCgQfoCQ32nWsVPVQLUdxanqifq73HnnXdKv379pGPHjm638qk7DdStfirgq0qHqlaoC5GCvgIIPA4168/fnQAAABWHzB8AAMMQ/AEAMAzBHwAAwxD8AQAwDMEfAADDEPwBADAMwR8AAMMQ/AEAMAzBHwAAwxD8AQAwDMEfAADDEPwBABCz/H/7/EzFrVqrTwAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:\n",
      "  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='{\\n  \"la...: None}, annotations=[]), input_type=Message])\n",
      "  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n",
      "  return self.__pydantic_serializer__.to_python(\n",
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:\n",
      "  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='{\\n  \"la...: None}, annotations=[]), input_type=Message])\n",
      "  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n",
      "  return self.__pydantic_serializer__.to_python(\n",
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:\n",
      "  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='{\\n  \"la...: None}, annotations=[]), input_type=Message])\n",
      "  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n",
      "  return self.__pydantic_serializer__.to_python(\n"
     ]
    }
   ],
   "source": [
    "evaluator_task = create_task_function(eval_prompt, model=\"gpt-4.1-nano\")\n",
    "print(\"Running evaluator experiment...\")\n",
    "experiment = run_experiment(\n",
    "    dataset=dev_dataset,\n",
    "    task=evaluator_task,\n",
    "    evaluators=[eval_tp, eval_tn, eval_fp, eval_fn, accuracy],\n",
    "    concurrency=3,\n",
    ")\n",
    "experiment_id = experiment.id\n",
    "print(f\"Experiment completed! Experiment ID: {experiment_id}\")\n",
    "\n",
    "# View experiment results\n",
    "results = retrieve_results(experiment_id)\n",
    "compute_metrics(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test Set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Awesome! We were able to achieve 100% accuracy on the dev set with our changes! Let's test on our test set now to ensure that our evaluator can model queries in general cases, and has not overfit to our training set. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/priyanjindal/recipe-chatbot/.venv/lib/python3.13/site-packages/phoenix/utilities/client.py:60: UserWarning: The Phoenix server (11.13.2) and client (11.18.0) versions are mismatched and may have compatibility issues.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running evaluator experiment...\n",
      "🧪 Experiment started.\n",
      "📺 View dataset experiments: http://127.0.0.1:6006/datasets/RGF0YXNldDo1/experiments\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDo1/compare?experimentId=RXhwZXJpbWVudDoxNQ==\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running tasks |█████████▍| 31/33 (93.9%) | ⏳ 02:14<00:07 |  3.58s/it"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Task runs completed.\n",
      "🧠 Evaluation started.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running tasks |██████████| 33/33 (100.0%) | ⏳ 02:15<00:00 |  4.10s/it\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "🔗 View this experiment: http://127.0.0.1:6006/datasets/RGF0YXNldDo1/compare?experimentId=RXhwZXJpbWVudDoxNQ==\n",
      "\n",
      "Experiment Summary (08/05/25 11:00 AM -0700)\n",
      "--------------------------------------------\n",
      "  evaluator   n  n_scores  avg_score  n_labels               top_2_labels\n",
      "0  accuracy  33        33   0.969697        33   {'True': 32, 'False': 1}\n",
      "1   eval_fn  33        33   0.030303        33   {'False': 32, 'True': 1}\n",
      "2   eval_fp  33        33   0.000000        33              {'False': 33}\n",
      "3   eval_tn  33        33   0.303030        33  {'False': 23, 'True': 10}\n",
      "4   eval_tp  33        33   0.666667        33  {'True': 22, 'False': 11}\n",
      "\n",
      "Tasks Summary (08/05/25 10:59 AM -0700)\n",
      "---------------------------------------\n",
      "   n_examples  n_runs  n_errors\n",
      "0          33      33         0\n",
      "Experiment completed! Experiment ID: RXhwZXJpbWVudDoxNQ==\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "<span style=\"font-weight: bold\">Judge Performance on Dev Set:</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n",
       "\u001b[1mJudge Performance on Dev Set:\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Positive Rate <span style=\"font-weight: bold\">(</span>TPR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.957</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Positive Rate \u001b[1m(\u001b[0mTPR\u001b[1m)\u001b[0m: \u001b[1;36m0.957\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> Negative Rate <span style=\"font-weight: bold\">(</span>TNR<span style=\"font-weight: bold\">)</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.000</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3;92mTrue\u001b[0m Negative Rate \u001b[1m(\u001b[0mTNR\u001b[1m)\u001b[0m: \u001b[1;36m1.000\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Balanced Accuracy: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.978</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Balanced Accuracy: \u001b[1;36m0.978\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "running experiment evaluations |██████████| 165/165 (100.0%) | ⏳ 00:19<00:00 |  8.62it/s\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAf8AAAGwCAYAAACn/2wHAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMyJJREFUeJzt3Ql0VNX9wPHfJJAEMAk7IRI2WSOrUWkQBSqC2IOAtlWKsgj4rxWLIiCoLEo1rVRABcGqiFgpoBVUtLGIsgloCWBFBVkCCYWwCiFBEpiZ/7kXZ5KBBDJ5M5nlfj8992Tem/fe3KQefnN/9/fuszmdTqcAAABjRAS6AwAAoGIR/AEAMAzBHwAAwxD8AQAwDMEfAADDEPwBADAMwR8AAMNUkhDmcDjkwIEDEhsbKzabLdDdAQB4SS01c+rUKUlMTJSICP+NR8+cOSOFhYWWrxMVFSUxMTES6kI6+KvAn5SUFOhuAAAsys7OlgYNGvgt8DdpdIXkHLZbvlZCQoJkZmaG/BeAkA7+asSvNB43USKiQ/v/CKA0Dad+FeguAH5zTs7KOvnY/e+5P6gRvwr8+zIaS1xs+bMLuacc0ihlr74ewT+AXKl+FfgjQvz/CKA0lWyVA90FwH9+XmC+IqZur4i16VZeDgmf6eWQDv4AAJSV3ekQu9Pa+eGC4A8AMIJDnLqVl5Vzgw23+gEAYBhG/gAAIzj0/6ydHy4I/gAAI9idTt3Ky8q5wYa0PwAAhmHkDwAwAgV/RQj+AAAjqOBtJ/hrpP0BADAMI38AgBFI+xch+AMAjEC1fxHS/gAAGIaRPwDACGqJHmuL/IQPgj8AwAh2i9X+dub8AQAILeqJftae6idhgzl/AAAMw8gfAGAE5vyLEPwBAEZwiE3sYrN0frgg7Q8AgGEY+QMAjOBwnm/lZeXcYEPwBwAYwW4x7W8n7Q8AAEIVI38AgBEY+Rch+AMAjOBw2nQrLyvnBhvS/gAAGIaRPwDACKT9ixD8AQBGsEuEbuVll/BB2h8AYATnz3P+jnI2db430tLS5LrrrpPY2FipW7eu9OvXT3bs2OFxzJkzZ+TBBx+UWrVqyRVXXCF33nmnHDp06DK/h1MmTZok9evXlypVqkiPHj1k586dXvWN4A8AgB+sXr1aB/aNGzfKihUr5OzZs9KzZ0/Jz893H/PII4/Ihx9+KO+8844+/sCBA3LHHXdc8rrPPfecvPjiizJ37lz58ssvpVq1atKrVy/9RaKsSPsDAIxQ0XP+6enpHtvz58/XGYCMjAy56aab5OTJk/L666/LwoUL5Ze//KU+5o033pDWrVvrLwy/+MUvShz1z5w5U5588knp27ev3rdgwQKpV6+eLFu2TO6+++4y9Y2RPwDACHZnhOWm5ObmerSCggIpCxXslZo1a+qf6kuAygaotL1Lq1atpGHDhrJhw4YSr5GZmSk5OTke58THx0unTp1KPackBH8AALyQlJSkA66rqbn9y3E4HPLwww/LDTfcIG3atNH7VBCPioqS6tWrexyrRvHqvZK49qtjynpOSUj7AwCMoB7J67Aw5nXI+Sf7ZGdnS1xcnHt/dHT0Zc9Vc//btm2TdevWSTBg5A8AMGrO326hKSrwF2+XC/4jR46U5cuXy+effy4NGjRw709ISJDCwkI5ceKEx/Gq2l+9VxLX/gvvCLjUOSUh+AMA4AeqOE8F/qVLl8pnn30mTZo08Xg/JSVFKleuLCtXrnTvU7cCZmVlSWpqaonXVNdQQb74OaruQFX9l3ZOSUj7AwCMULxorzzszvNp/7JSqX5Vyf/+++/re/1dc/KqTkDdn69+Dhs2TEaPHq2LAFUW4aGHHtJBvHilvyoCVHUF/fv3F5vNpmsH/vSnP0nz5s31l4GJEydKYmKiXkegrAj+AACD5vwtPNjHy3PnzJmjf3br1s1jv7qdb8iQIfr1jBkzJCIiQi/uo+4aUPfrv/zyyx7Hq2yA604BZdy4cXqtgPvvv19PGXTp0kXfVhgTE1PmvtmcKi8RolSqQ31zajrxGYnw4pcGQknjJ8t++w4Qas45z8oqeV8Ht+JFdP6IFf/8uoVUi40s93XyT9nlzvY/+LWvFYWRPwDACA6La/s7fq72DwcEfwCAESp6zj+YEfwBAMaM/H1xn3844FY/AAAMw8gfAGAEu9OmW3lZOTfYEPwBAEawWyz4s5P2BwAAoYqRPwDACA5nhG7l5aDaHwCA0ELavwhpfwAADMPIHwBgBIfFin2HhA+CPwDACNYX+YmQcBE+vwkAACgTRv4AACNYX9s/QsIFwR8AYASH2HQrLyvnBhuCPwDACIz8i4TPbwIAAMqEkT8AwAjWF/mJkHBB8AcAGMHhtOlWXlbODTbh8zUGAACUCSN/AIAR1CI9VlL3jjAaLxP8AQBGsP5UvwgJF+HzmwAAgDJh5A8AMIJdbLqVl5Vzgw3BHwBgBNL+RcLnNwEAAGXCyB8AYAS7xdS9XcIHwR8AYATS/kUI/gAAI/BgnyLh85sAAIAyYeQPADCCU2zisDDn7wyjW/0Y+QMAjEr72y00b6xZs0b69OkjiYmJYrPZZNmyZR7vq30ltWnTppV6zSlTplx0fKtWrbz+WxD8AQDwg/z8fGnfvr3Mnj27xPcPHjzo0ebNm6eD+Z133nnJ61599dUe561bt87rvpH2BwAYoaIf6du7d2/dSpOQkOCx/f7770v37t2ladOml7xupUqVLjrXWwR/AIAR7Baf6mf/+dzc3FyP/dHR0bpZcejQIfnoo4/kzTffvOyxO3fu1FMJMTExkpqaKmlpadKwYUOvPo+0PwAAXkhKSpL4+Hh3U8HXKhX0Y2Nj5Y477rjkcZ06dZL58+dLenq6zJkzRzIzM+XGG2+UU6dOefV5jPwBAEbwVdo/Oztb4uLi3PutjvoVNd8/cOBAPZq/lOLTCO3atdNfBho1aiRLliyRYcOGlfnzCP4AACM4JEK38nKdqwJ/8eBv1dq1a2XHjh2yePFir8+tXr26tGjRQnbt2uXVeaT9AQAIoNdff11SUlL0nQHeysvLk927d0v9+vW9Oo/gDwAwgt1ps9y8Dcxbt27VTVHz8+p1VlaW+xhVPPjOO+/I8OHDS7zGzTffLLNmzXJvjxkzRlavXi179+6V9evXS//+/SUyMlIGDBjgVd9I+wMAjFDRt/pt2rRJ37rnMnr0aP1z8ODBumhPWbRokTidzlKDtxrVHz161L29f/9+feyxY8ekTp060qVLF9m4caN+7Q2CPwDACE6LT/Vzenlut27ddGC/lPvvv1+30qgRfnHqy4IvkPYHAMAwjPwBAEawi0238rJybrAh+AMAjOBwej9vf+H54YK0PwAAhmHkj4tcW++ADGv7tbSpfUTqVj0tf/i0l6zMalLsCKf8seMm+U3L7yUuqkA2H06QKetvlH251QPYa8C6PkOOyq8fOCw165yTPd9VkZefvFJ2bK0a6G7BRxwWC/4cFs4NNuHzm8BnqlY+JzuO15KnNtxY4vsj2m6Ve5O/0QH/tx/eIT+drSyv9/pIoiLPVXhfAV/pevuPcv/kA/L29AR5sFcL2fNdjDyzcI/E1zob6K7BRxxis9zCRVAEf/Ws48aNG+s1jdU6xV999VWgu2S0NfsbyszN18un+4qP9l2cMujqb2TO19fobMCOH2vJuDXdpW6V09KjoectKUAoueP+o5K+sKb8e3FNydoZIy8+1kAKfrJJrwHHA901IPyCv1rLWC18MHnyZNm8ebNe3rBXr15y+PDhQHcNJWgQe0pPBaw/0MC9L+9stHx9pK50rJsT0L4B5VWpskOatzstm9fGuvc5nTbZsjZWklNOB7RvCN0V/oJZwIP/9OnTZcSIETJ06FBJTk6WuXPnStWqVfUTjhB86lQ5/w/hsZ+qeOw/dqaK1K7yU4B6BVgTV9MukZVEThzxLIP68WglqVGH6axwm/N3WGjhIqC/SWFhoWRkZEiPHj2KOhQRobc3bNhw0fEFBQV6HeTiDQAAhFDwV+sV2+12qVevnsd+tZ2Tc3EKOS0tTeLj490tKSmpAnsL5chP5yufa10wyq8V85McvSAbAISK3OORYj8nUv2CUX6N2ufkxwuyAQhdumjPaaFR8BcYEyZMkJMnT7pbdnZ2oLtknP2nYuXw6aqSmvg/975qlQulfZ3DsuVwQkD7BpTXubMRsvO/VaVjl1PufTabUzp0yZPvMrjVL1w4LVb6O8Mo+Af0K23t2rX1owgPHTrksV9tJyRcHEiio6N1g39VrXRWGsaddG83iM2VVjWPysmCaDmYHysLvm0rD7TPkH0n42V/XqyMuuY/cvinqvJpVuOA9huw4r2/1ZYxM7Plh6+ryo4tVaX/iCMSU9Uh/15UM9BdQ4g+1S+YBTT4R0VFSUpKiqxcuVL69eun9zkcDr09cuTIQHbNaG1qH5a3bvvQvf14p/P1F+/tbCET1v5SXv2mg1SpdE6evmG1xEUVSsbhBBn+ya+k0E56FKFr9Qc1JL6WXQaNzdFFfnu+rSJPDGwiJ45WDnTXAJ8L+L/W6jY/9Wzja6+9Vq6//nqZOXOm5Ofn6+p/BMZXOVdKy3m/v8QRNnlxy3W6AeHkgzdq64bwxAp/QRT877rrLjly5IhMmjRJF/l16NBB0tPTLyoCBADACtL+QRT8FZXiJ80PAIBBwR8AAH+zuj6/g2p/AABCC2n/IuFTvQAAAMqEkT8AwAiM/IsQ/AEARiD4FyHtDwCAYRj5AwCMwMi/CMEfAGAEp8Xb9ZwSPgj+AAAjMPIvwpw/AACGYeQPADACI/8iBH8AgBEI/kVI+wMAYBhG/gAAIzDyL8LIHwBgBKfTZrl5Y82aNdKnTx9JTEwUm80my5Yt83h/yJAhen/xduutt172urNnz5bGjRtLTEyMdOrUSb766ivxFsEfAAA/yM/Pl/bt2+tgXRoV7A8ePOhu//jHPy55zcWLF8vo0aNl8uTJsnnzZn39Xr16yeHDh73qG2l/AIAR1AI/Vhb5cXh5bu/evXW7lOjoaElISCjzNadPny4jRoyQoUOH6u25c+fKRx99JPPmzZPx48eX+TqM/AEARs35Oyw0JTc316MVFBSUu0+rVq2SunXrSsuWLeWBBx6QY8eOlXpsYWGhZGRkSI8ePdz7IiIi9PaGDRu8+lyCPwAAXkhKSpL4+Hh3S0tLk/JQKf8FCxbIypUr5S9/+YusXr1aZwrsdnuJxx89elS/V69ePY/9ajsnJ8erzybtDwAwQnmK9opznZudnS1xcXEeqfvyuPvuu92v27ZtK+3atZOrrrpKZwNuvvlm8SdG/gAAI/gq7R8XF+fRyhv8L9S0aVOpXbu27Nq1q8T31XuRkZFy6NAhj/1q25u6AYXgDwAwQkXf6uet/fv36zn/+vXrl/h+VFSUpKSk6GkCF4fDobdTU1O9+iyCPwAAfpCXlydbt27VTcnMzNSvs7Ky9Htjx46VjRs3yt69e3UA79u3rzRr1kzfuuei0v+zZs1yb6vb/F599VV588035fvvv9dFguqWQlf1f1kx5w8AMIIauVtZpc/p5bmbNm2S7t27ewRuZfDgwTJnzhz573//q4P4iRMn9EJAPXv2lKlTp3pMI+zevVsX+rncddddcuTIEZk0aZIu8uvQoYOkp6dfVAR4OQR/AIARnDqAWzvfG926dRPnJT7wk08+uew1VFbgQiNHjtTNCtL+AAAYhpE/AMAIaoU+9b/ysrI6YLAh+AMAjOCr+/zDAWl/AAAMw8gfAGAEVelvszB6d4TRyJ/gDwAwgiq8t1Tt75SwQdofAADDMPIHABiBgr8iBH8AgBEI/kUI/gAAI1DwV4Q5fwAADMPIHwBgBKr9ixD8AQAGBX8rc/4SNkj7AwBgGEb+AAAjUO1fhOAPADCCytpbydw7JXyQ9gcAwDCM/AEARiDtX4TgDwAwA3l/N4I/AMAMFkf+EkYjf+b8AQAwDCN/AIARWOGvCMEfAGAECv6KkPYHAMAwjPwBAGZQI3cK/jSCPwDACMz5FyHtDwCAYRj5AwDMwCI/bgR/AIARqPb3Mvh/8MEHUla33357mY8FAABBGvz79etXpovZbDax2+1W+wQAgH+EUere7wV/DoejTI3ADwAI9rS/00Lzxpo1a6RPnz6SmJioB8fLli1zv3f27Fl57LHHpG3btlKtWjV9zKBBg+TAgQOXvOaUKVP0tYq3Vq1aVWy1/5kzZ6ycDgBAxRf8OS00L+Tn50v79u1l9uzZF713+vRp2bx5s0ycOFH/fO+992THjh1lmjq/+uqr5eDBg+62bt06/xf8qdH9s88+K3PnzpVDhw7JDz/8IE2bNtW/QOPGjWXYsGFedwIAgHDTu3dv3UoSHx8vK1as8Ng3a9Ysuf766yUrK0saNmxY6nUrVaokCQkJlvrm9cj/mWeekfnz58tzzz0nUVFR7v1t2rSR1157zVJnAADwH5sPmkhubq5HKygo8EnvTp48qdP41atXv+RxO3fu1NMEauA9cOBA/WXB78F/wYIF8re//U1/YGRkpHu/Sm1s377d6w4AABBKaf+kpCQ9cne1tLQ0y11T0+iqBmDAgAESFxdX6nGdOnXSA/D09HSZM2eOZGZmyo033iinTp3yb9r/f//7nzRr1uyi/argTxUwAAAQzrKzsz0CdHR0tKXrqdj529/+VpxOpw7ol1J8GqFdu3b6y0CjRo1kyZIlXk27ex38k5OTZe3atfrDinv33XelY8eO3l4OAICQWuEvLi7ukqPz8gT+ffv2yWeffeb1ddUUQYsWLWTXrl1ened18J80aZIMHjxYZwDUaN9VoaimA5YvX+7t5QAAMPKpfmd/DvxqDv/zzz+XWrVqeX2NvLw82b17t9x7773+nfPv27evfPjhh/Lpp5/qexPVl4Hvv/9e77vlllu8vRwAAGEpLy9Ptm7dqpui5ufVa1WgpwL/r3/9a9m0aZO8/fbb+k66nJwc3QoLC93XuPnmm/VdAC5jxoyR1atXy969e2X9+vXSv39/XX+nagX8vra/Ki648BYFAACCWUU/0nfTpk3SvXt39/bo0aP1T5U9V4v1uJbO79Chg8d5KgvQrVs3/VqN6o8ePep+b//+/TrQHzt2TOrUqSNdunSRjRs36tcV8mAf9UupEb+rDiAlJaW8lwIAIOye6tetWzddxFfq5crwbUKN8ItbtGiR+ILXwd/1reOLL75w34t44sQJ6dy5s+5UgwYNfNIxAADgH17P+Q8fPlzPVahR//Hjx3VTr1Xxn3oPAICgLvhzWmhhwuuRvyo0UEUGLVu2dO9Tr1966SVdCwAAQDCyOc+38rJybsgHf7WyUUmL+ahKRbXcIAAAQamC5/zDKu0/bdo0eeihh3TBn4t6PWrUKPnrX//q6/4BAIBAjPxr1KihHzZQ/DGFaklB9WQh5dy5c/r1fffdJ/369fN1HwEACLtFfoI++M+cOdP/PQEAwJ9I+3sX/NWCBAAAIDyUe5Ef1yMIiy9DqPjqYQcAAPgUI//yF/yp+f6RI0dK3bp19dr+qh6geAMAIKiDv9NCMzX4jxs3Tj92UD1zWD3D+LXXXpOnnnpK3+annuwHAADCLO2vnt6ngrxas3jo0KF6YZ9mzZpJo0aN9JOJBg4c6J+eAgBgBdX+5R/5q+V8mzZt6p7fV9uKerLQmjVrvL0cAAAVusKfzUIzNvirwK+eSay0atVKlixZ4s4IuB70AwAAwij4q1T/119/rV+PHz9eZs+eLTExMfLII4/I2LFj/dFHAACso+Cv/HP+Ksi79OjRQ7Zv3y4ZGRl63r9du3beXg4AAITSff6KKvRTDQCAYKbK9Sw91U8MC/4vvvhimS/4xz/+0Up/AABAMAT/GTNmlOli6uE/gQj+TV7aLpVsURX+uUBF+PjA1kB3AfCb3FMOqdGigj6MW/28C/6u6n4AAEIWy/uWv9ofAAAYXvAHAEBIYOTvRvAHABjB6ip9tjAK/qT9AQAwDCN/AIAZSPtbG/mvXbtW7rnnHklNTZX//e9/et9bb70l69atK8/lAADwP5b3LX/w/+c//ym9evWSKlWqyJYtW6SgoEDvP3nypDz77LPeXg4AAAR78P/Tn/4kc+fOlVdffVUqV67s3n/DDTfI5s2bfd0/AAB8gkf6Wpjz37Fjh9x0000X7Y+Pj5cTJ054ezkAACoGK/yVf+SfkJAgu3btumi/mu9v2rSpt5cDAKBiMOdf/uA/YsQIGTVqlHz55Zd6Lf8DBw7I22+/LWPGjJEHHnjA28sBAIBgD/7jx4+X3/3ud3LzzTdLXl6engIYPny4/N///Z889NBD/uklAAAhNue/Zs0a6dOnjyQmJurB8rJlyzzedzqdMmnSJKlfv74uou/Ro4fs3LnzstedPXu2NG7cWGJiYqRTp07y1Vdf+T/4q1/giSeekOPHj8u2bdtk48aNcuTIEZk6darXHw4AQLim/fPz86V9+/Y6WJfkueeekxdffFEX0atserVq1fTddGfOnCn1mosXL5bRo0fL5MmTdZG9ur465/DhwxWzyE9UVJQkJyeX93QAAMJa7969dSuJGvXPnDlTnnzySenbt6/et2DBAqlXr57OENx9990lnjd9+nQ9/T506FC9rb44fPTRRzJv3jydmfdb8O/evbse/Zfms88+8/aSAAD4n9Xb9Zznf+Tm5nrsjo6O1s0bmZmZkpOTo1P9xe+aU2n8DRs2lBj8CwsLJSMjQyZMmODeFxERoa+hzvFr2r9Dhw46zeBqavSvOqTSD23btvX2cgAAhFTaPykpSQdqV0tLS/O6KyrwK2qkX5zadr13oaNHj4rdbvfqHJ+N/GfMmFHi/ilTpugCQAAAwll2drbExcW5t70d9YfVU/3UWv9qzgEAgHAe+cfFxXm08gR/tWaOcujQIY/9atv13oVq164tkZGRXp3j9+Cv5hvUbQcAAASjYFret0mTJjpgr1y50r1P1RKoqn/10LzSCu1TUlI8znE4HHq7tHN8lva/4447LqpYPHjwoGzatEkmTpzo7eUAAAhLeXl5HiviqiK/rVu3Ss2aNaVhw4by8MMP6+flNG/eXH8ZUDFUrQnQr18/9zlqTZ3+/fvLyJEj9ba6zW/w4MFy7bXXyvXXX6/vGFC3FLqq//0W/FVxQ3Gq0rBly5by9NNPS8+ePb29HAAAYWnTpk36DjkXFbgVFbznz58v48aN04H7/vvv18/G6dKli6Snp3tk0Xfv3q0L/VzuuusuvbaOWhxIFfmpInx1zoVFgJdjc6qhexmpKsMvvvhCV/XXqFFDAk2lSNSXkZur3yuVbFGB7g7gFx9/tzrQXQD8JveUQ2q02KMfC1+8iM4fseKqCc9KpIXpafuZM7I77XG/9rWieDXnrwoN1Oiep/cBAEJNMM35B5rXBX9t2rSRPXv2+Kc3AAAg+IK/Kk5QT/Bbvny5LvRT6ZTiDQCAoMXjfL0r+FMFfY8++qjcdtttevv222/3WOZXlQ6obVUXAABA0LEaxJ1iXvB/6qmn5Pe//718/vnn/u0RAAAIjuDvuimga9eu/uwPAAB+YbVoz2biyF+51NP8AAAIaqT9yxf8W7RocdkvAMePH/fmkgAAIJiDv5r3v3CFPwAAQgFp/3IG/7vvvlvq1q3rzSkAAAQH0v7e3+fPfD8AAIZW+wMAEJIY+Xsf/NUzgwEACFXM+Vt4pC8AACGJkX/51/YHAAChjZE/AMAMjPzdCP4AACMw51+EtD8AAIZh5A8AMANpfzeCPwDACKT9i5D2BwDAMIz8AQBmIO3vRvAHAJiB4O9G2h8AAMMw8gcAGEE9m9bK82ltEj4I/gAAM5D2dyP4AwCMwK1+RZjzBwDAMIz8AQBmIO3vRvAHAJgjjAK4FaT9AQAwDMEfAGBUwZ/NQvNG48aNxWazXdQefPDBEo+fP3/+RcfGxMSIP5D2BwCYoYLn/P/zn/+I3W53b2/btk1uueUW+c1vflPqOXFxcbJjxw73tvoC4A8EfwAA/KBOnToe23/+85/lqquukq5du5Z6jgr2CQkJ4m+k/QEARvBV2j83N9ejFRQUXPazCwsL5e9//7vcd999lxzN5+XlSaNGjSQpKUn69u0r3377rfgDwR8AYFba32mhiejAHB8f725paWmX/ehly5bJiRMnZMiQIaUe07JlS5k3b568//77+ouCw+GQzp07y/79+8XXSPsDAOCF7OxsPTfvEh0dfdlzXn/9dendu7ckJiaWekxqaqpuLirwt27dWl555RWZOnWq+BLBHwBgBF8t7xsXF+cR/C9n37598umnn8p7773n1edVrlxZOnbsKLt27RJfI+0PADCDj9L+3nrjjTekbt268qtf/cqr89SdAt98843Ur19ffI3gDwAwQwCCv8Ph0MF/8ODBUqmSZ7J90KBBMmHCBPf2008/Lf/+979lz549snnzZrnnnnt01mD48OHia6T9AQDwE5Xuz8rK0lX+F1L7IyKKxuA//vijjBgxQnJycqRGjRqSkpIi69evl+TkZJ/3i+APADBCIB7p27NnT3E6Sz5x1apVHtszZszQrSIQ/AEAZuCpfm7M+QMAYBhG/gAAI9icTt3Ky8q5wYbgDwAwA2l/N9L+AAAYhpE/AMAIgaj2D1YEfwCAGUj7u5H2BwDAMIz8AQBGIO1fhOAPADADaX83gj8AwAiM/Isw5w8AgGEY+QMAzEDa343gDwAwRjil7q0g7Q8AgGEY+QMAzKAezGPl4TzO8EkbEPwBAEag2r8IaX8AAAzDyB8AYAaq/d0I/gAAI9gc51t5WTk32JD2BwDAMIz8USZtUk7Infftl2ZX50mtuoUy9aFk2bCydqC7BZTLopfqyhcfV5fsXdESFeOQ5GtPy7AnDkhSswL9fu6PkfLWXxNk8+pYOXwgSuJrnpPOt56UweMOSrW4MBr+mYa0vxsjf5RJTFWHZO6oJi9PbRborgCW/XfDFdJnyFGZuXynpC3aLfZzIo8PuErOnD7/T+LxQ5Xl2KHKMmLSAXnls+0yZmaWbFoVK9MfbRjorsMH1f42Cy1cBHTkv2bNGpk2bZpkZGTIwYMHZenSpdKvX79Adgml2LS2pm5AOHh24R6P7UdnZsldbdvKzv9Wkba/yJfGrc7IpNf2ut9PbFwoQx47KM891Eh/UYgkZxqauM8/OEb++fn50r59e5k9e3YguwHAcPm5kfpnbHX7JY+peoWDwI+wEND/jHv37q1bWRUUFOjmkpub66eeATCFwyEyd/KVcvV1eXrEX5KTxyJl4cwE6X3P0QrvH3yHRX5CdM4/LS1N4uPj3S0pKSnQXQIQ4mY93kD2ba8iE+bsK/H9/FMRMnFQU2nY4ozc+2hOhfcPfij4c1poYSKkgv+ECRPk5MmT7padnR3oLgEIYbMev1K+XBEnz727S+oknr3o/dN5EfLE766SKtUcMvn1TKlUOSDdBHwupGavoqOjdQMAK1Td1uwnrpT16fEy7d1dktCwsMQRvwr8laOc8tT8PRIVE0bDPkOR9g/R4I/Aialql8SGP7m36115Rpq2ypNTJyvJkYMxAe0bUJ5U/+dLa8iUN/ZIlSsccvzw+X8Kq8XaJbqKUwd+detfwU8RMu6lTDmdFymn886fG1/rnESerw9EqKHa343gjzJpfvUp+cub/3Vv3z/+/K1SK5bWkxlPtAxgzwDvLX/z/AJVY+9s7rH/0RlZ0vOu47Lrm6qyfXM1vW9o52SPY9788jtJSLo4UwCEkoAG/7y8PNm1a5d7OzMzU7Zu3So1a9aUhg1ZTCOYfPOf6nJb8k2B7gbgE58c2HrJ99t3zrvsMQg9FZ32nzJlijz11FMe+1q2bCnbt28v9Zx33nlHJk6cKHv37pXmzZvLX/7yF7ntttskrAr+Nm3aJB07dtRNGT16tH49adKkQHYLABCOAlDtf/XVV+tF7Fxt3bp1pR67fv16GTBggAwbNky2bNmiF71Tbdu2bRJWI/9u3bqJM4zmUAAAKK5SpUqSkJAgZfHCCy/IrbfeKmPHjtXbU6dOlRUrVsisWbNk7ty5YuytfgAABHpt/9zcXI9WfPG5C+3cuVMSExOladOmMnDgQMnKyir12A0bNkiPHj089vXq1Uvv9zWCPwDADA6n9SaiF5grvuCcWoCuJJ06dZL58+dLenq6zJkzR9e13XjjjXLq1KkSj8/JyZF69ep57FPbar+vUe0PADCDjx7pm52dLXFxce7dpa0/U3z5+nbt2ukvA40aNZIlS5boef1AIvgDAOAFFfiLB/+yql69urRo0cLjLrfiVG3AoUOHPPap7bLWDHiDtD8AwAg2q/P+Yv329t27d0v9+vVLfD81NVVWrlzpsU8V/Kn9vkbwBwCYtcKf00LzwpgxY2T16tX6nn11G1///v0lMjJS386nDBo0SD+zxmXUqFG6PuD555/XawGodQLULfEjR470+Z+CtD8AAH6wf/9+HeiPHTsmderUkS5dusjGjRv1a0VV/kdEFI3BO3fuLAsXLpQnn3xSHn/8cb3Iz7Jly6RNmzY+7xvBHwBghIpe4W/RokWXfH/VqlUX7fvNb36jm78R/AEAZvBRtX84YM4fAADDMPIHABjB5nTqVl5Wzg02BH8AgBkcP7fysnJukCHtDwCAYRj5AwCMQNq/CMEfAGAGqv3dCP4AADOUY5U+D2E08mfOHwAAwzDyBwAYoaJX+AtmBH8AgBlI+7uR9gcAwDCM/AEARrA5zrfysnJusCH4AwDMQNrfjbQ/AACGYeQPADADi/y4EfwBAEZged8ipP0BADAMI38AgBko+HMj+AMAzKBit5Xb9ZwSNgj+AAAjMOdfhDl/AAAMw8gfAGDQrX5W5vwlbBD8AQBmoODPjbQ/AACGYeQPADCDqvS3WTw/TBD8AQBGoNq/CGl/AAAMw8gfAGAGCv7cCP4AADMQ/N1I+wMAYBiCPwDArJG/00LzQlpamlx33XUSGxsrdevWlX79+smOHTsuec78+fPFZrN5tJiYGPE1gj8AwAwOHzQvrF69Wh588EHZuHGjrFixQs6ePSs9e/aU/Pz8S54XFxcnBw8edLd9+/aJrzHnDwAwQkXf6peenn7RqF5lADIyMuSmm24q/XNsNklISBB/YuQPAIAXcnNzPVpBQUGZzjt58qT+WbNmzUsel5eXJ40aNZKkpCTp27evfPvtt+JrBH8AgBl8NOeflJQk8fHx7qbm9i/H4XDIww8/LDfccIO0adOm1ONatmwp8+bNk/fff1/+/ve/6/M6d+4s+/fv9+mfgrQ/AMAMDqfK3Yul80UkOztbz8u7REdHX/ZUNfe/bds2Wbdu3SWPS01N1c1FBf7WrVvLK6+8IlOnThVfIfgDAOAFFfiLB//LGTlypCxfvlzWrFkjDRo08OajpHLlytKxY0fZtWuX+BJpfwCAGSr4Vj+n06kD/9KlS+Wzzz6TJk2aeN1lu90u33zzjdSvX198iZE/AMAQFlf4E+/OVan+hQsX6vl7da9/Tk6O3q/qBKpUqaJfDxo0SK688kp33cDTTz8tv/jFL6RZs2Zy4sQJmTZtmr7Vb/jw4eJLBH8AAPxgzpw5+me3bt089r/xxhsyZMgQ/TorK0siIoqS8D/++KOMGDFCf1GoUaOGpKSkyPr16yU5OdmnfSP4AwDMUMFr+zvLcPyqVas8tmfMmKGbvxH8AQBm0NX61qv9wwEFfwAAGIaRPwDADE7H+VZeVs4NMgR/AIAZKnjOP5gR/AEAZmDO3405fwAADMPIHwBgBtL+bgR/AIAZdNbfSvCXsEHaHwAAwzDyBwCYgbS/G8EfAGAGh7pP38K9+vr88EDaHwAAwzDyBwCYgbS/G8EfAGAGgr8baX8AAAzDyB8AYAaW93Uj+AMAjOB0OnQrLyvnBhuCPwDADGrO3sro3Rk+I3/m/AEAMAwjfwCAGfTInZG/QvAHAJhBrdBnszBvH0Zz/qT9AQAwDCN/AIAZSPu7EfwBAEZwOhzitJD2d5L2BwAAoYqRPwDADKT93Qj+AAAzqAV+bAR/hbQ/AACGYeQPADCDTttbuc+ftD8AACHF6XCK00La30nwBwAgxOhb9VjhT2HOHwAAP5o9e7Y0btxYYmJipFOnTvLVV19d8vh33nlHWrVqpY9v27atfPzxxz7vE8EfAGBO2t9i89bixYtl9OjRMnnyZNm8ebO0b99eevXqJYcPHy7x+PXr18uAAQNk2LBhsmXLFunXr59u27ZtE18i+AMAzEn7W21emj59uowYMUKGDh0qycnJMnfuXKlatarMmzevxONfeOEFufXWW2Xs2LHSunVrmTp1qlxzzTUya9Ys8aWQnvN3FV+ccxYGuiuA3+SeCp8lRYEL5eY5KqyY7pyctbTGzzl1vupzbq7H/ujoaN0uVFhYKBkZGTJhwgT3voiICOnRo4ds2LChxM9Q+1WmoDiVKVi2bJn4UkgH/1OnTumfq08uDnRXAL+p0SLQPQAq5t/z+Ph4v1w7KipKEhISZF2O9bnzK664QpKSkjz2qZT+lClTLjr26NGjYrfbpV69eh771fb27dtLvH5OTk6Jx6v9vhTSwT8xMVGys7MlNjZWbDZboLtjBPWNV/2Hr/7ucXFxge4O4FP8913x1IhfBX7177m/qMK5zMxMPRL3RX9tF8Sbkkb9wS6kg79KnzRo0CDQ3TCS+oeRfxwRrvjvu2L5a8R/4RcA1SpS7dq1JTIyUg4dOuSxX22rTERJ1H5vji8vCv4AAPDTdENKSoqsXLnSvc/hcOjt1NTUEs9R+4sfr6xYsaLU440c+QMAEMxGjx4tgwcPlmuvvVauv/56mTlzpuTn5+vqf2XQoEFy5ZVXSlpamt4eNWqUdO3aVZ5//nn51a9+JYsWLZJNmzbJ3/72N5/2i+APr6i5LVXcEopzXMDl8N83fO2uu+6SI0eOyKRJk3TRXocOHSQ9Pd1d1JeVlaWnsF06d+4sCxculCeffFIef/xxad68ua70b9OmjU/7ZXOG02LFAADgspjzBwDAMAR/AAAMQ/AHAMAwBH8AAAxD8IffHksJhIo1a9ZInz599CpzavU2X6+jDgQbgj/88lhKIJSo+67Vf9PqCy5gAm71Q5mokf51113nfqykWqVKrYH+0EMPyfjx4wPdPcBn1Mh/6dKl+hnqQLhi5I/Lcj2WUj2GsqyPpQQABC+CPy7rUo+l9PVjJgEA/kfwBwDAMAR/+OWxlACA4EXwh18eSwkACF481Q8+eSwlEMry8vJk165d7u3MzEzZunWr1KxZUxo2bBjQvgH+wK1+KDN1m9+0adPcj6V88cUX9S2AQKhbtWqVdO/e/aL96gvv/PnzA9InwJ8I/gAAGIY5fwAADEPwBwDAMAR/AAAMQ/AHAMAwBH8AAAxD8AcAwDAEfwAADEPwBwDAMAR/wKIhQ4ZIv3793NvdunWThx9+OCCr1NlsNjlx4kSpx6j3ly1bVuZrTpkyRa/maMXevXv156rlcgEEB4I/wjYgq4CjmnowUbNmzeTpp5+Wc+fO+f2z33vvPZk6darPAjYA+BoP9kHYuvXWW+WNN96QgoIC+fjjj+XBBx+UypUry4QJEy46trCwUH9J8AX1MBgACGaM/BG2oqOjJSEhQRo1aiQPPPCA9OjRQz744AOPVP0zzzwjiYmJ0rJlS70/Oztbfvvb30r16tV1EO/bt69OW7vY7Xb9hEP1fq1atWTcuHFy4eMxLkz7qy8fjz32mCQlJek+qSzE66+/rq/rephMjRo1dAZA9cv1yOS0tDRp0qSJVKlSRdq3by/vvvuux+eoLzQtWrTQ76vrFO9nWal+qWtUrVpVmjZtKhMnTpSzZ89edNwrr7yi+6+OU3+fkydPerz/2muvSevWrSUmJkZatWolL7/8std9AVBxCP4whgqSaoTvsnLlStmxY4esWLFCli9froNer169JDY2VtauXStffPGFXHHFFTqD4Drv+eef1095mzdvnqxbt06OHz8uS5cuveTnDho0SP7xj3/opyB+//33OpCq66pg+s9//lMfo/px8OBBeeGFF/S2CvwLFiyQuXPnyrfffiuPPPKI3HPPPbJ69Wr3l5Q77rhD+vTpo+fShw8fLuPHj/f6b6J+V/X7fPfdd/qzX331VZkxY4bHMepRt0uWLJEPP/xQ0tPTZcuWLfKHP/zB/f7bb78tkyZN0l+k1O/37LPP6i8Rb775ptf9AVBB1FP9gHAzePBgZ9++ffVrh8PhXLFihTM6Oto5ZswY9/v16tVzFhQUuM956623nC1bttTHu6j3q1Sp4vzkk0/0dv369Z3PPfec+/2zZ886GzRo4P4spWvXrs5Ro0bp1zt27FBpAf35Jfn888/1+z/++KN735kzZ5xVq1Z1rl+/3uPYYcOGOQcMGKBfT5gwwZmcnOzx/mOPPXbRtS6k3l+6dGmp70+bNs2ZkpLi3p48ebIzMjLSuX//fve+f/3rX86IiAjnwYMH9fZVV13lXLhwocd1pk6d6kxNTdWvMzMz9edu2bKl1M8FULGY80fYUqN5NcJWI3qVRv/d736nq9dd2rZt6zHP//XXX+tRrhoNF3fmzBnZvXu3TnWr0XmnTp3c71WqVEmuvfbai1L/LmpUHhkZKV27di1zv1UfTp8+LbfccovHfpV96Nixo36tRtjF+6GkpqaKtxYvXqwzEur3y8vL0wWRcXFxHsc0bNhQrrzySo/PUX9Pla1Qfyt17rBhw2TEiBHuY9R14uPjve4PgIpB8EfYUvPgc+bM0QFezeurQF1ctWrVPLZV8EtJSdFp7AvVqVOn3FMN3lL9UD766COPoKuomgFf2bBhgwwcOFCeeuopPd2hgvWiRYv01Ia3fVXTBRd+GVFfegAEJ4I/wpYK7qq4rqyuueYaPRKuW7fuRaNfl/r168uXX34pN910k3uEm5GRoc8ticouqFGymqtXBYcXcmUeVCGhS3Jysg7yWVlZpWYMVHGdq3jRZePGjeKN9evX62LIJ554wr1v3759Fx2n+nHgwAH9Bcr1OREREbpIsl69enr/nj179BcJAKGBgj/gZyp41a5dW1f4q4K/zMxMfR/+H//4R9m/f78+ZtSoUfLnP/9ZL5Szfft2Xfh2qXv0GzduLIMHD5b77rtPn+O6piqgU1TwVVX+aoriyJEjeiStUuljxozRRX6qaE6l1Tdv3iwvvfSSu4ju97//vezcuVPGjh2r0+8LFy7UhXveaN68uQ7sarSvPkOl/0sqXlQV/Op3UNMi6u+i/h6q4l/dSaGozIEqUFTn//DDD/LNN9/oWyynT5/uVX8AVByCP/AzdRvbmjVr9By3qqRXo2s1l63m/F2ZgEcffVTuvfdeHQzV3LcK1P3797/kddXUw69//Wv9RUHdBqfmxvPz8/V7Kq2vgqeq1Fej6JEjR+r9apEgVTGvgqrqh7rjQE0DqFv/FNVHdaeA+kKhbgNUdwWoKntv3H777foLhvpMtYqfygSoz7yQyp6ov8dtt90mPXv2lHbt2nncyqfuNFC3+qmArzIdKluhvoi4+gog+NhU1V+gOwEAACoOI38AAAxD8AcAwDAEfwAADEPwBwDAMAR/AAAMQ/AHAMAwBH8AAAxD8AcAwDAEfwAADEPwBwDAMAR/AADELP8PxswLQRLz3NoAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "evaluator_task = create_task_function(eval_prompt, model=\"gpt-4.1-nano\")\n",
    "print(\"Running evaluator experiment...\")\n",
    "experiment = run_experiment(\n",
    "    dataset=test_dataset,\n",
    "    task=evaluator_task,\n",
    "    evaluators=[eval_tp, eval_tn, eval_fp, eval_fn, accuracy],\n",
    "    concurrency=3,\n",
    ")\n",
    "experiment_id = experiment.id\n",
    "print(f\"Experiment completed! Experiment ID: {experiment_id}\")\n",
    "\n",
    "# View experiment results\n",
    "results = retrieve_results(experiment_id)\n",
    "compute_metrics(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Save Results\n",
    "\n",
    "Awesome! We've got strong results on our test set. Let's save our results and our predictions. We will be using this later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Saved performance metrics to results/judge_performance.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mSaved performance metrics to results/judge_performance.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Saved test predictions to results/test_predictions.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mSaved test predictions to results/test_predictions.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Saved judgy test data to results/judgy_test_data.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mSaved judgy test data to results/judgy_test_data.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from scripts.evaluate_judge import save_results\n",
    "\n",
    "results_dir = \"results\"\n",
    "test_traces = pd.read_csv(\"data/test_set.csv\")\n",
    "\n",
    "tpr = 0.957\n",
    "tnr = 1.000\n",
    "\n",
    "predictions_data = []\n",
    "for idx, entry in enumerate(results):\n",
    "    # Extract prediction and ground truth\n",
    "    prediction = entry.get(\"output\", {})\n",
    "    test_data = test_traces.iloc[idx]\n",
    "\n",
    "    predictions_data.append(\n",
    "        {\n",
    "            \"ground_truth_label\": test_data.get(\"ground_truth_label\"),\n",
    "            \"llm_as_judge_label\": prediction.get(\"label\"),\n",
    "            \"explanation\": prediction.get(\"explanation\"),\n",
    "            \"attributes.query\": test_data.get(\"attributes.query\"),\n",
    "            \"attributes.dietary_restriction\": test_data.get(\"attributes.dietary_restriction\"),\n",
    "            \"attributes.output.value\": test_data.get(\"attributes.output.value\"),\n",
    "        }\n",
    "    )\n",
    "\n",
    "predictions = pd.DataFrame(predictions_data)\n",
    "save_results(tpr, tnr, predictions, results_dir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000\">Loading traces from Phoenix...</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[33mLoading traces from Phoenix\u001b[0m\u001b[33m...\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Loaded </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">1000</span><span style=\"color: #008000; text-decoration-color: #008000\"> traces from Phoenix</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mLoaded \u001b[0m\u001b[1;32m1000\u001b[0m\u001b[32m traces from Phoenix\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000\">Running judge on </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">1000</span><span style=\"color: #808000; text-decoration-color: #808000\"> traces with Phoenix evals...</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[33mRunning judge on \u001b[0m\u001b[1;33m1000\u001b[0m\u001b[33m traces with Phoenix evals\u001b[0m\u001b[33m...\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llm_generate |██████████| 1000/1000 (100.0%) | ⏳ 02:54<00:00 |  8.51it/s"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Completed labeling of </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">1000</span><span style=\"color: #008000; text-decoration-color: #008000\"> traces</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mCompleted labeling of \u001b[0m\u001b[1;32m1000\u001b[0m\u001b[32m traces\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Completed LLM-as-Judge Evaluation, logged to Phoenix</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mCompleted LLM-as-Judge Evaluation, logged to Phoenix\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completed evaluation of 1000 traces\n",
      "Raw success rate: 0.827\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "<span style=\"font-weight: bold\">Final Results:</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n",
       "\u001b[1mFinal Results:\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">==============================\n",
       "</pre>\n"
      ],
      "text/plain": [
       "==============================\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #000080; text-decoration-color: #000080\">Raw Observed Success Rate: </span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">0.827</span><span style=\"color: #000080; text-decoration-color: #000080\"> </span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">(</span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">82.7</span><span style=\"color: #000080; text-decoration-color: #000080\">%</span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[34mRaw Observed Success Rate: \u001b[0m\u001b[1;34m0.827\u001b[0m\u001b[34m \u001b[0m\u001b[1;34m(\u001b[0m\u001b[1;34m82.7\u001b[0m\u001b[34m%\u001b[0m\u001b[1;34m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Corrected Success Rate: </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">0.865</span><span style=\"color: #008000; text-decoration-color: #008000\"> </span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">86.5</span><span style=\"color: #008000; text-decoration-color: #008000\">%</span><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mCorrected Success Rate: \u001b[0m\u001b[1;32m0.865\u001b[0m\u001b[32m \u001b[0m\u001b[1;32m(\u001b[0m\u001b[1;32m86.5\u001b[0m\u001b[32m%\u001b[0m\u001b[1;32m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">95</span><span style=\"color: #808000; text-decoration-color: #808000\">% Confidence Interval: </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">[</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">0.827</span><span style=\"color: #808000; text-decoration-color: #808000\">, </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">0.965</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">]</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;33m95\u001b[0m\u001b[33m% Confidence Interval: \u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m0.827\u001b[0m\u001b[33m, \u001b[0m\u001b[1;33m0.965\u001b[0m\u001b[1;33m]\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000\">                        </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">[</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">82.7</span><span style=\"color: #808000; text-decoration-color: #808000\">%, </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">96.5</span><span style=\"color: #808000; text-decoration-color: #808000\">%</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">]</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[33m                        \u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m82.7\u001b[0m\u001b[33m%, \u001b[0m\u001b[1;33m96.5\u001b[0m\u001b[33m%\u001b[0m\u001b[1;33m]\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080\">Correction Applied: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.038</span><span style=\"color: #008080; text-decoration-color: #008080\"> </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3.8</span><span style=\"color: #008080; text-decoration-color: #008080\"> percentage points</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[36mCorrection Applied: \u001b[0m\u001b[1;36m0.038\u001b[0m\u001b[36m \u001b[0m\u001b[1;36m(\u001b[0m\u001b[1;36m3.8\u001b[0m\u001b[36m percentage points\u001b[0m\u001b[1;36m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llm_generate |██████████| 1000/1000 (100.0%) | ⏳ 02:56<00:00 |  5.68it/s\n"
     ]
    },
    {
     "ename": "NameError",
     "evalue": "name 'results_dir' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
      "\u001b[31mNameError\u001b[39m                                 Traceback (most recent call last)",
      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[10]\u001b[39m\u001b[32m, line 18\u001b[39m\n\u001b[32m     12\u001b[39m theta_hat, lower_bound, upper_bound, raw_success_rate = compute_metrics_with_judgy(\n\u001b[32m     13\u001b[39m         test_labels, test_preds, binary_predictions\n\u001b[32m     14\u001b[39m     )\n\u001b[32m     16\u001b[39m print_interpretation(theta_hat, lower_bound, upper_bound, raw_success_rate)\n\u001b[32m     17\u001b[39m save_final_results(theta_hat, lower_bound, upper_bound, raw_success_rate, \n\u001b[32m---> \u001b[39m\u001b[32m18\u001b[39m                       \u001b[38;5;28mlen\u001b[39m(all_traces), \u001b[43mresults_dir\u001b[49m)\n",
      "\u001b[31mNameError\u001b[39m: name 'results_dir' is not defined"
     ]
    }
   ],
   "source": [
    "from scripts.run_full_evaluation import (\n",
    "    compute_metrics_with_judgy,\n",
    "    load_test_data,\n",
    "    load_traces_from_phoenix,\n",
    "    print_interpretation,\n",
    "    run_judge_on_traces,\n",
    "    save_final_results,\n",
    ")\n",
    "\n",
    "all_traces = load_traces_from_phoenix()\n",
    "binary_predictions, predictions_df = run_judge_on_traces(eval_prompt, all_traces)\n",
    "print(f\"Completed evaluation of {len(binary_predictions)} traces\")\n",
    "print(f\"Raw success rate: {np.mean(binary_predictions):.3f}\")\n",
    "\n",
    "judgy_path = \"results/judgy_test_data.json\"\n",
    "test_labels, test_preds = load_test_data(judgy_path)\n",
    "\n",
    "theta_hat, lower_bound, upper_bound, raw_success_rate = compute_metrics_with_judgy(\n",
    "    test_labels, test_preds, binary_predictions\n",
    ")\n",
    "\n",
    "print_interpretation(theta_hat, lower_bound, upper_bound, raw_success_rate)\n",
    "save_final_results(\n",
    "    theta_hat, lower_bound, upper_bound, raw_success_rate, len(all_traces), results_dir\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base2",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
